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Preface to the second edition 


Since the publication of the first edition of this book, we have, both through 
teaching the material it covers and as a result of receiving helpful comments from 
colleagues, become aware of the desirability of changes in a number of areas. 
The most important of these is the fact that the mathematical preparation of 
current senior college and university entrants is now less than it used to be. To 
match this, we have decided to include a preliminary chapter covering areas such 
as polynomial equations, trigonometric identities, coordinate geometry, partial 
fractions, binomial expansions, necessary and sufficient conditions, and proof by 
induction and contradiction. 

Whilst the general level of what is included in this second edition has not 
been raised, some areas have been expanded to take in topics we now feel were 
not adequately covered in the first. In particular, increased attention has been 
given to non-square sets of simultaneous linear equations and their associated 
matrices. We hope that this more extended treatment, together with the inclusion 
of singular value matrix decomposition will make the material of more practical 
use to engineering students. In the same spirit, an elementary treatment of linear 
recurrence relations has been included. The topic of normal modes has now been 
given a small chapter of its own, though the links to matrices on the one hand, 
and to representation theory on the other, have not been lost. 

Elsewhere, the presentation of probability and statistics has been reorganised to 
give the two aspects more nearly equal weights. The early part of the probability 
chapter has been rewritten in order to present a more coherent development 
based on Boolean algebra, the fundamental axioms of probability theory and 
the properties of intersections and unions. Whilst this is somewhat more formal 
than previously, we think that it has not reduced the accessibility of these topics 
and hope that it has increased it. The scope of the chapter has been somewhat 
extended to include all physically important distributions and an introduction to 
cumulants. 
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Statistics now occupies a substantial chapter of its own, one that includes 
systematic discussions of estimators and their efficiency, sample distributions, 
and t- and F-tests for comparing means and variances. Other new topics are 
applications of the chi-squared distribution, maximum-likelihood parameter es- 
timation and least-squares fitting. In other chapters we have added material on 
the following topics: curvature, envelopes, curve-sketching, more refined numer- 
ical methods for differential equations, and the elements of integration using 
monte-carlo techniques. 

Over the last four years we have received somewhat mixed feedback about 
the number of exercises to include at the ends of the various chapters. After 
consideration, we decided to increase it substantially, partly to correspond to the 
additional topics covered in the text, but mainly to give both students and their 
teachers a wider choice. There are now nearly eight hundred such exercises, many 
with several parts. An even more vexed question is that of whether or not to 
provide hints and answers to all of the exercises, or just to ‘the odd-numbered’ 
ones, as is the normal practice for textbooks in the United States, thus making 
the remainder more suitable for setting as homework. In the end, we decided that 
hints and outline solutions should be provided for all the exercises, in order to 
facilitate independent study while leaving the details of the calculation as a task 
for the student. 

In conclusion we hope that this edition will be thought by its users to be 
‘heading in the right direction’ and would like to place on record our thanks to 
all who have helped to bring about the changes and adjustments. Naturally, those 
colleagues who have noted errors or ambiguities in the first edition and brought 
them to our attention figure high on the list, as do the staff at The Cambridge 
University Press. In particular, we are grateful to Dave Green for continued UlfiX 
advice, Susan Parkinson for copy-editing the 2nd edition with her usual keen eye 
for detail and flair for crafting coherent prose, and Alison Woollatt for once again 
turning our basic DTeX into a beautifully typeset book. Our thanks go to all of 
them, though of course we accept full responsibility for any remaining errors or 
ambiguities, of which, as with any new publication, there are bound to be some. 

On a more personal note, KFR again wishes to thank his wife Penny for her 
unwavering support, not only in his academic and tutorial work, but also in their 
joint efforts to convert time at the bridge table into ‘green points’ on their record. 
MPH is once more indebted to his wife, Becky, and his mother, Pat, for their 
tireless support and encouragement above and beyond the call of duty. MPH 
dedicates his contribution to this book to the memory of his father, Ronald 
Leonard Hobson, whose gentle kindness, patient understanding and unbreakable 
spirit made all things seem possible. 

Ken Riley, Michael Hobson 
Cambridge, 2002 
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Preface to the first edition 


A knowledge of mathematical methods is important for an increasing number of 
university and college courses, particularly in physics, engineering and chemistry, 
but also in more general science. Students embarking on such courses come from 
diverse mathematical backgrounds, and their core knowledge varies considerably. 
We have therefore decided to write a textbook that assumes knowledge only of 
material that can be expected to be familiar to all the current generation of 
students starting physical science courses at university. In the United Kingdom 
this corresponds to the standard of Mathematics A-level, whereas in the United 
States the material assumed is that which would normally be covered at junior 
college. 

Starting from this level, the first six chapters cover a collection of topics 
with which the reader may already be familiar, but which are here extended 
and applied to typical problems encountered by first-year university students. 
They are aimed at providing a common base of general techniques used in 
the development of the remaining chapters. Students who have had additional 
preparation, such as Further Mathematics at A-level, will find much of this 
material straightforward. 

Following these opening chapters, the remainder of the book is intended to 
cover at least that mathematical material which an undergraduate in the physical 
sciences might encounter up to the end of his or her course. The book is also 
appropriate for those beginning graduate study with a mathematical content, and 
naturally much of the material forms parts of courses for mathematics students. 
Furthermore, the text should provide a useful reference for research workers. 

The general aim of the book is to present a topic in three stages. The first 
stage is a qualitative introduction, wherever possible from a physical point of 
view. The second is a more formal presentation, although we have deliberately 
avoided strictly mathematical questions such as the existence of limits, uniform 
convergence, the interchanging of integration and summation orders, etc. on the 
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grounds that ‘this is the real world; it must behave reasonably’. Finally a worked 
example is presented, often drawn from familiar situations in physical science 
and engineering. These examples have generally been fully worked, since, in 
the authors’ experience, partially worked examples are unpopular with students. 
Only in a few cases, where trivial algebraic manipulation is involved, or where 
repetition of the main text would result, has an example been left as an exercise 
for the reader. Nevertheless, a number of exercises also appear at the end of each 
chapter, and these should give the reader ample opportunity to test his or her 
understanding. Hints and answers to these exercises are also provided. 

With regard to the presentation of the mathematics, it has to be accepted that 
many equations (especially partial differential equations) can be written more 
compactly by using subscripts, e.g. u xy for a second partial derivative, instead of 
the more familiar d 2 u/dx8y, and that this certainly saves typographical space. 
However, for many students, the labour of mentally unpacking such equations 
is sufficiently great that it is not possible to think of an equation’s physical 
interpretation at the same time. Consequently, wherever possible we have decided 
to write out such expressions in their more obvious but longer form. 

During the writing of this book we have received much help and encouragement 
from various colleagues at the Cavendish Laboratory, Clare College, Trinity Hall 
and Peterhouse. In particular, we would like to thank Peter Scheuer, whose 
comments and general enthusiasm proved invaluable in the early stages. For 
reading sections of the manuscript, for pointing out misprints and for numerous 
useful comments, we thank many of our students and colleagues at the University 
of Cambridge. We are especially grateful to Chris Doran, John Huber, Garth 
Leder, Tom Korner and, not least, Mike Stobbs, who, sadly, died before the book 
was completed. We also extend our thanks to the University of Cambridge and 
the Cavendish teaching staff, whose examination questions and lecture hand-outs 
have collectively provided the basis for some of the examples included. Of course, 
any errors and ambiguities remaining are entirely the responsibility of the authors, 
and we would be most grateful to have them brought to our attention. 

We are indebted to Dave Green for a great deal of advice concerning typesetting 
in DTgX and to Andrew Lovatt for various other computing tips. Our thanks 
also go to Anja Visser and Gra9a Rocha for enduring many hours of (sometimes 
heated) debate. At Cambridge University Press, we are very grateful to our editor 
Adam Black for his help and patience and to Alison Woollatt for her expert 
typesetting of such a complicated text. We also thank our copy-editor Susan 
Parkinson for many useful suggestions that have undoubtedly improved the style 
of the book. 

Finally, on a personal note, KFR wishes to thank his wife Penny, not only for 
a long and happy marriage, but also for her support and understanding during 
his recent illness - and when things have not gone too well at the bridge table! 
MPH is indebted both to Rebecca Morris and to his parents for their tireless 
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support and patience, and for their unending supplies of tea. SJB is grateful to 
Anthony Gritten for numerous relaxing discussions about J. S. Bach, to Susannah 
Ticciati for her patience and understanding, and to Kate Isaak for her calming 
late-night e-mails from the USA. 

Ken Riley, Michael Hobson and Stephen Bence 

Cambridge, 1997 
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Preliminary algebra 


This opening chapter reviews the basic algebra of which a working knowledge is 
presumed in the rest of the book. Many students will be familiar with much, if 
not all, of it, but recent changes in what is studied during secondary education 
mean that it cannot be taken for granted that they will already have a mastery 
of all the topics presented here. The reader may assess which areas need further 
study or revision by attempting the exercises at the end of the chapter. The main 
areas covered are polynomial equations and the related topic of partial fractions, 
curve sketching, coordinate geometry, trigonometric identities and the notions of 
proof by induction or contradiction. 


1.1 Simple functions and equations 

It is normal practice when starting the mathematical investigation of a physical 
problem to assign an algebraic symbol to the quantity whose value is sought, either 
numerically or as an explicit algebraic expression. For the sake of definiteness, in 
this chapter we will use x to denote this quantity most of the time. Subsequent 
steps in the analysis involve applying a combination of known laws, consistency 
conditions and (possibly) given constraints to derive one or more equations 
satisfied by x. These equations may take many forms, ranging from a simple 
polynomial equation to, say, a partial differential equation with several boundary 
conditions. Some of the more complicated possibilities are treated in the later 
chapters of this book, but for the present we will be concerned with techniques 
for the solution of relatively straightforward algebraic equations. 


1.1.1 Polynomials and polynomial equations 

Firstly we consider the simplest type of equation, a polynomial equation in which 
a polynomial expression in x, denoted by f(x ), is set equal to zero and thereby 
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forms an equation which is satisfied by particular values of x; these values are 
called the roots of the equation. 

f(x) = a n x n + a n _\x n ~ l H + rqx + ciq = 0. (1.1) 

Here n is an integer > 0, called the degree of both the polynomial and the 
equation, and the known coefficients ao, tq, are real quantities with a„ ^ 0. 

Equations such as (1.1) arise frequently in physical problems, the coefficients a,- 
being determined by the physical properties of the system under study. What is 
needed is to find some or all of the roots solutions of (1.1), i.e. the x-values, a*,, 
that satisfy /(oq) = 0; here k is an index that, as we shall see later, can take up to 
n different values, i.e. k = 1,2 The roots of the polynomial equations can 
equally well be described as the zeroes of the polynomial. When they are real, 
they correspond to the points at which a graph of f(x) crosses the x-axis. Roots 
that are complex (see chapter 3) do not have such a graphical interpretation. 

For polynomial equations containing powers of x greater tha x 4 general meth- 
ods do not exist for obtaining explicit expressions for the roots cq. Even for 
n = 3 and n = 4 the prescriptions for obtaining the roots are sufficiently compli- 
cated that it is usually preferable to obtain exact or approximate values by other 
methods. Only for n = 1 and n = 2 can closed-form solutions be given. These 
results will be well known to the reader, but they are given here for the sake of 
completeness. For n = 1, (1.1) reduces to the linear equation 

tqx + ao = 0; (1.2) 


the solution (root) is cq = — ao/cq. For n = 2, (1.1) reduces to the quadratic 
equation 

fib* 2 + £qx + a 0 = 0; (1.3) 


the two roots oq and oq are given by 

— <q it 
at, 2 = 


y ' a\ — 4a 2 a 0 


2a-> 


(1.4) 


When discussing specifically quadratic equations, as opposed to more general 
polynomial equations, it is usual to write the equation in one of the two notations 

ax 2 + bx + c = 0, ax 2 + 2 bx + c = 0, (1.5) 


with respective explicit pairs of solutions 

—b + Jb 2 — 4 ac 
oq ,2 = — 


—b + ■> Jb 2 — ac 


, , at , 2 = • (1-6) 

2a a 

Of course, these two notations are entirely equivalent and the only important 
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point is to associate each form of answer with the corresponding form of equation; 
most people keep to one form, to avoid any possible confusion. 

If the value of the quantity appearing under the square root sign is positive 
then both roots are real; if it is negative then the roots form a complex conjugate 
pair, i.e. they are of the form p + iq with p and q real (see chapter 3); if it has 
zero value then the two roots are equal and special considerations usually arise. 

Thus linear and quadratic equations can be dealt with in a cut-and-dried way. 
We now turn to methods for obtaining partial information about the roots of 
higher-degree polynomial equations. In some circumstances the knowledge that 
an equation has a root lying in a certain range, or that it has no real roots at all, 
is all that is actually required. For example, in the design of electronic circuits 
it is necessary to know whether the current in a proposed circuit will break 
into spontaneous oscillation. To test this, it is sufficient to establish whether a 
certain polynomial equation, whose coefficients are determined by the physical 
parameters of the circuit, has a root with a positive real part (see chapter 3); 
complete determination of all the roots is not needed for this purpose. If the 
complete set of roots of a polynomial equation is required, it can usually be 
obtained to any desired accuracy by numerical methods such as those described 
in chapter 28. 

There is no explicit step-by-step approach to finding the roots of a general 
polynomial equation such as (1.1). In most cases analytic methods yield only 
information about the roots, rather than their exact values. To explain the relevant 
techniques we will consider a particular example, ‘thinking aloud’ on paper and 
expanding on special points about methods and lines of reasoning. In more 
routine situations such comment would be absent and the whole process briefer 
and more tightly focussed. 


Example: the cubic case 
Let us investigate the roots of the equation 

g(.x) = 4x 3 + 3x 2 — 6x — 1 = 0 (1.7) 

or, in an alternative phrasing, investigate the zeroes of g(x). We note first of all 
that this is a cubic equation. It can be seen that for x large and positive g(x) 
will be large and positive and equally that for x large and negative g(x) will 
be large and negative. Therefore, intuitively (or, more formally, by continuity) 
g(x) must cross the x-axis at least once and so g(x) = 0 must have at least one 
real root. Furthermore, it can be shown that if /(x) is an nth-degree polynomial 
then the graph of /(x) must cross the x-axis an even or odd number of times 
as x varies between — oo and +oo, according to whether n itself is even or odd. 
Thus a polynomial of odd degree always has at least one real root, but one of 
even degree may have no real root. A small complication, discussed later in this 
section, occurs when repeated roots arise. 
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Having established that g(x) = 0, equation(1.7), has at least one real root, we 
may ask how many real roots it could have. To answer this we need one of the 
fundamental theorems of algebra, mentioned above : 

An nth-degree polynomial equation has exactly n roots. 

It should be noted that this does not imply that there are n real roots (only that 
there are not more than n ) ; some of the roots may be of the form p + iq. 

To make the above theorem plausible and to see what is meant by repeated 
roots, let us suppose that the nth-degree polynomial equation f(x) = 0, (1.1), has 
r roots <x\,a. 2 ,...,a. r considered distinct for the moment. That is, we suppose that 
f(y.k) = 0 for k = 1,2, so that f(x) vanishes only when x is equal to one of 
the r values a k- But the same can be said for the function 

F(x) = A(x — a i)(x — a 2 ) ■ ■ ■ (x — a,.), (1.8) 

in which A is a non-zero constant; F(x) can clearly be multiplied out to form a 
polynomial expression. 

We now call upon a second fudamental result in algebra: that if two polynomial 
functions /(x) and F(x) have equal values for all values of x, then their coefficients 
are equal on a term-by-term basis. In other words, we can equate the coefficients 
of each and every power of x in the two expressions; in particular we can equate 
the coefficients of the highest power of x. From this we have Ax r = a n x n and 
thus that r — n and A = a n . As r is both equal to n and to the number of roots 
of f(x) = 0, we conclude that the nth-degree polynomial /(x) = 0 has n roots. 
(Although this line of reasoning may make the theorem plausible, it does not 
constitute a proof since we have not shown that it is permissible to write /(x) in 
the form of equation (1.8).) 

We next note that the condition f(iXk) = 0 for k = 1,2, could also be met 
if (1.8) were replaced by 

F(x) = A(x - oq p (x - a 2 P • • • (x - a r )"\ (1.9) 

with A = a„. In (1.9) the mu are integers > 1 and are known as the multiplicities 
of the roots, m;< being the multiplicity of otk- Expanding the right-hand side (RHS) 
leads to a polynomial of degree m\ + ni 2 + • • • + m r . This sum must be equal to n. 
Thus, if any of the m;< is greater than unity then the number of distinct roots, r, 
is less than n; the total number of roots remains at n, but one or more of the cq 
counts more than once. For example, the equation 

F(x) = A(x — oq) 2 (x — a 2 ) 3 (x — 0:3 )(x — 04) = 0 

has exactly seven roots, oq being a double root and a 2 a triple root, whilst oq and 
a 4 are unrepeated (simple) roots. 

We can now say that our particular equation (1.7) has either one or three real 
roots but in the latter case it may be that not all the roots are distinct. To decide 
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4>l{x) <f> 2 {x) 




Figure 1.1 Two curves cj> fix) and f/> 2 (x), both with zero derivatives at the 
same values of x, but with different numbers of real solutions to (j>i(x) = 0. 


how many real roots the equation has, we need to anticipate two ideas from the 
next chapter. The first of these is the notion of the derivative of a function, and 
the second is a result known as Rolle’s theorem. 

The derivative f'(x) of a function f(x) measures the slope of the tangent to 
the graph of f(x) at that value of x (see figure 2.1 in the next chapter). For 
the moment, the reader with no prior knowledge of calculus is asked to accept 
that the derivative of ax n is wax" -1 , so that the derivative g'(x) of the curve 
g(.x) = 4x 3 + 3x 2 — 6x — 1 is given by g'(x) = 12x 2 + 6x — 6. Similar expressions 
for the derivatives of other polynomials are used later in this chapter. 

Rolle’s theorem states that, if f(x) has equal values at two different values of 
x then at some point between these two x-values its derivative is equal to zero; 
i.e. the tangent to its graph is parallel to the x-axis at that point (see figure 2.2). 

Flaving briefly mentioned the derivative of a function and Rolle’s theorem, we 
now use them to etablish whether g(x) has one or three real zeroes. If g(x) = 0 
does have three real roots a k, i.e. g(oq) = 0 for k = 1,2,3, then it follows from 
Rolle’s theorem that between any consecutive pair of them (say cq and 1 x 2 ) there 
must be some real value of x at which g'(x) = 0. Similarly, there must be a further 
zero of g'(x) lying between oq and oq. Thus a necessary condition for three real 
roots of g(x) = 0 is that g'(x) = 0 itself has two real roots. 

However, this condition on the number of roots of g'(x) = 0, whilst necessary, 
is not sufficient to guarantee three real roots of g(x) = 0. This can be seen by 
inspecting the cubic curves in figure 1.1. For each of the two functions cj) i(x) and 
(friix), the derivative is equal to zero at both x = /fi and x = fi 2 . Clearly, though, 
cj) 2 {x) = 0 has three real roots whilst </>i(x) = 0 has only one. It is easy to see that 
the crucial difference is that </>i(/^i) and <Mfe) have the same sign, whilst <f> 2 (ySi) 
and (/> 2 (fe) have opposite signs. 
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It will be apparent that for some cubic equations, 4>{x) = 0 say, 4>'{x) equals 
zero at a value of x for which cj)(x) is also zero. Then the graph of cp(x) just 
touches the x-axis and there may appear to be only two roots. However, when 
this happens the value of x so found is, in fact, a double real root of the cubic 
(corresponding to one of the mu in (1.9) having the value 2) and must be counted 
twice when determining the number of real roots. 

Finally, then, we are in a position to decide the number of real roots of the 
equation 

g(x) = 4x 3 + 3x 2 — 6x — 1 = 0. 


The equation g'(x) = 0, with g'(x) = 12x 2 + 6x — 6, is a quadratic equation with 
explicit solutions! 

0 -3 ± V9 + 72 

/?U u > 

so that /?i = — 1 and /?2 = 1 / 2. The corresponding values of g(x) are g(/h ) = 4 and 
g(/T) = —11/4, which are of opposite sign. This indicates that 4x 3 +3x 2 — 6x— 1 = 0 
has three real roots, one lying in the range — 1 < x < \ and the others one on 
each side of that range. 

The techniques we have developed above have been used to tackle a cubic 
equation, but they can be applied to polynomial equations /(x) = 0 of degree 
greater than 3. However, much of the analysis centres around the equation 
/'(x) = 0 and this, itself, being then a polynomial equation of degree 3 or more 
either has no closed-form general solution or one that is complicated to evaluate. 
Thus the amount of information that can be obtained about the roots of /(x) = 0 
is correspondingly reduced. 


A more general case 

To illustrate what can (and cannot) be done in the more general case we now 
investigate as far as possible the real roots of 

/ (x) = x 7 + 5x 6 + x 4 — x 3 + x 2 — 2 = 0. 


The following points can be made. 

(i) This is a seventh-degree polynomial equation; therefore the number of 
real roots is 1, 3, 5 or 7. 

(ii) /( 0) is negative whilst /( oo) = +oo, so there must be at least one positive 
root. 


f The two roots /Si, are written as By convention /Si refers to the upper symbol in +, to 
the lower symbol. 
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(iii) The equation fix) = 0 can be written as x(7x 5 + 30x 4 + 4x 2 — 3x + 2) = 0 
and thus x = 0 is a solution. The derivative of /'(.x), denoted by f"(x), 
equals 42x 5 + 150.x 4 + 12x 2 — 6.x + 2. That f'(x) is zero whilst /"(x) is 
positive at x = 0 indicates (subsection 2.1.8 ) that f(x) has a minimum 
there. This, together with the facts that / (0) is negative and /( oo) = oo, 
implies that the total number of real roots to the right of x = 0 must be 
odd. Since the total number of real roots must be odd, the number to the 
left must be even (0, 2, 4 or 6). 

This is about all that can be deduced by simple analytic methods in this case, 
although some further progress can be made in the ways indicated in exercise 1.3. 

There are, in fact, more sophisticated tests that examine the relative signs of 
successive terms in an equation such as (1.1), and in quantities derived from 
them, to place limits on the numbers and positions of roots. But they are not 
prerequisites for the remainder of this book and will not be pursued further here. 

We conclude this section with a worked example which demonstrates that the 
practical application of the ideas developed so far can be both short and decisive. 


►For what values of k, if any, does 

f(x) = x 3 — 3x 2 + 6x + k = 0 

have three real roots? 


Firstly study the equation f'(x) = 0, i.e. 3x 2 — 6.x + 6 = 0. This is a quadratic equation 
but, using (1.6), because 6 2 < 4 x 3 x 6, it can have no real roots. Therefore, it follows 
immediately that f(x) has no turning points, i.e. no maximum or minimum; consequently 
f(x) = 0 cannot have more than one real root, whatever the value of k. ◄ 


1.1.2 Factorising polynomials 

In the previous subsection we saw how a polynomial with r given distinct zeroes 
otk could be constructed as the product of factors containing those zeroes, 

/ (x) = a n (x — a i ) mi (x — 0 ( 2 )" 12 • • • (x — tx r ) m ' 

= ci n x n -{- a n —\x n 3 T • • • + a\x + no, ( 1.10) 

with mi +ni 2 H +m r = n, the degree of the polynomial. It will cause no loss of 

generality in what follows to suppose that all the zeroes are simple, i.e. all m;< = 1 

and r — n, and this we will do. 

Sometimes it is desirable to be able to reverse this process, in particular when 
one exact zero has been found by some method and the remaining zeroes are to 
be investigated. Suppose that we have located one zero, a; it is then possible to 
write (1.10) as 


f{x) = (x-a)/i(x), 


( 1 . 11 ) 
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where fi(x) is a polynomial of degree n— 1. How can we find / i(.x)? The procedure 
is much more complicated to describe in a general form than to carry out for 
an equation with given numerical coefficients a t . If such manipulations are too 
complicated to be carried out mentally, they could be laid out along the lines of 
an algebraic ‘long division’ sum. However, a more compact form of calculation 
is as follows. Write /i(.x) as 

/i(x) = ix" -1 + b„- 2 X n ~ 2 + b„- 3 x"~ 3 H f b\x + b 0 . 

Substitution of this form into (1.11) and subsequent comparison of the coefficients 
of x p for p = n, n— 1, ..., 1, 0 with those in the second line of (1.10) generates 
the series of equations 

bn — 1 

b n — 2 0cb n —l — Cl n —1> 

bn — 3 xb n — 2 Cl n —2-> 

bo — xb\ = a i, 

— a&o = no- 

These can be solved successively for the bj, starting either from the top or from 
the bottom of the series. In either case the final equation used serves as a check; 
if it is not satisfied, at least one mistake has been made in the computation - 
or a is not a zero of /(x) = 0. We now illustrate this procedure with a worked 
example. 


► Determine by inspection the simple roots of the equation 

f(x) = 3x 4 — x 3 — 10x 2 — 2x + 4 = 0 
and hence, by factorisation, find the rest of its roots. 

From the pattern of coefficients it can be seen that x = — 1 is a solution to the equation. 
We therefore write 

f(x) = (x + l)(fe 3 x 3 + b 2 x 2 + b\x + b 0 ), 

where 


b 3 = 3, 
b 2 + b 3 = - 1 , 
b l +b 2 = -10, 
bo + b\ = — 2 , 
b 0 = 4. 

These equations give b 3 = 3, b 2 = —4, hi = —6, bo = 4 (check) and so 
f(x) = (x + l)/i(x) = (x + l)(3x 3 — 4x 2 — 6x + 4). 




1.1 SIMPLE FUNCTIONS AND EQUATIONS 


We now note that fi(x) = 0 if x is set equal to 2. Thus x — 2 is a factor of fi(x), which 
therefore can be written as 

fi(x) = (x - 2 )f 2 (x) = (x - 2 )(c 2 x 2 + c,x + c 0 ) 

with 


c 2 = 3, 

ci — 2 c 2 = —4, 

Co — 2ci = —6, 

-2c 0 = 4. 

These equations determine fi(x) as 3.x 2 + 2.x — 2. Since fi(x) = 0 is a quadratic equation, 
its solutions can be written explicitly as 

-1 + JTT6 

x 3 

Thus the four roots of /(.x) = 0 are — 1,2, |(— 1 + ^Jl) and — fl). ◄ 


1.1.3 Properties of roots 

From the fact that a polynomial equation can be written in any of the alternative 
forms 

f(x) = a„x" + a n - ix' l_1 H + a\x + «o = 0, 

f{x) = a„(x — ai)" !l (.x — a 2 )'" 2 • • • (x — a r ) n ' r = 0, 

/ (x) = a„(x — ai)(x — a. 2 ) • • • (x — a„) = 0, 

it follows that it must be possible to express the coefficients a,- in terms of the 
roots a/;. To take the most obvious example, comparison of the constant terms 
(formally the coefficient of x°) in the first and third expressions shows that 


a n (- ai)(-a 2 ) • • • (-a„) = a 0 , 


or, using the product notation, 

n« fc =(-i)"^. (i.i2) 

k = i a " 

Only slightly less obvious is a result obtained by comparing the coefficients of 
x" -1 in the same two expressions of the polynomial : 


^a/< = (1-13) 

k= l a " 

Comparing the coefficients of other powers of x yields further results, though 
they are of less general use than the two just given. One such, which the reader 
may wish to derive, is 




7=1 k>j 


&n—2 


Ct n 


(1.14) 
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In the case of a quadratic equation these root properties are used sufficiently 
often that they are worth stating explicitly, as follows. If the roots of the quadratic 
equation ax 2 + bx + c = 0 are oq and cq then 

b 

a l + a 2 = 5 

a 

c 

cqoq = 

a 

If the alternative standard form for the quadratic is used, b is replaced by 2b in 
both the equation and the first of these results. 

►Find a cubic equation whose roots are —4, 3 and 5. 

From results (1.12) - (1.14) we can compute that, arbitrarily setting n 3 = 1, 

3 3 3 3 

— a 2 = 0£/t =4, 0l = (XyCt* = —17, «o = ( — 1 ) 3 ]^[ Dtk = 60. 

k = 1 j= 1 k>j k = 1 

Thus a possible cubic equation is x 3 + (— 4)x 2 + (— 17)x + (60) = 0. Of course, any multiple 
of x 3 — 4x 2 — 17x + 60 = 0 will do just as well. ◄ 


1.2 Trigonometric identities 

So many of the applications of mathematics to physics and engineering are 
concerned with periodic, and in particular sinusoidal, behaviour that a sure and 
ready handling of the corresponding mathematical functions is an essential skill. 
Even situations with no obvious periodicity are often expressed in terms of 
periodic functions for the purposes of analysis. Later in this book whole chapters 
are devoted to developing the techniques involved, but as a necessary prerequisite 
we here establish (or remind the reader of) some standard identities with which he 
or she should be fully familiar, so that the manipulation of expressions containing 
sinusoids becomes automatic and reliable. So as to emphasise the angular nature 
of the argument of a sinusoid we will denote it in this section by 8 rather than x. 


1.2.1 Single-angle identities 

We give without proof the basic identity satisfied by the sinusoidal functions sin 8 
and cos 8, namely 

cos 2 8 + sin 2 6 = 1. (1.15) 

If sin 8 and cos 8 have been defined geometrically in terms of the coordinates of 
a point on a circle, a reference to the name of Pythagoras will suffice to establish 
this result. If they have been defined by means of series (with 8 expressed in 
radians) then the reader should refer to Euler’s equation (3.23) on page 96, and 
note that e' e has unit modulus if 8 is real. 
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Figure 1.2 Illustration of the compound-angle identities. Refer to the main 
text for details. 


Other standard single-angle formulae derived from (1.15) by dividing through 
by various powers of sin 9 and cos 9 are 

1 + tan 2 9 = sec 2 9. (1.16) 

cot 2 9 + 1 = cosec 2 9. (1.17) 


1.2.2 Compound-angle identities 

The basis for building expressions for the sinusoidal functions of compound 
angles are those for the sum and difference of just two angles, since all other 
cases can be built up from these, in principle. Later we will see that a study of 
complex numbers can provide a more efficient approach in some cases. 

To prove the basic formulae for the sine and cosine of a compound angle 
A + B in terms of the sines and cosines of A and B , we consider the construction 
shown in figure 1.2. It shows two sets of axes, Oxy and Ox'y', with a common 
origin but rotated with respect to each other through an angle A. The point 
P lies on the unit circle centred on the common origin 0 and has coordinates 
cos(^4 + B), sinfi4 + B ) with respect to the axes Oxy and coordinates cos B, sin B 
with respect to the axes Ox’y 1 . 

Parallels to the axes Oxy (dotted lines) and Ox’y’ (broken lines) have been 
drawn through P . Further parallels (MR and RN) to the Ox'y' axes have been 
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drawn through R, the point (0, sinfi4 + B )) in the Oxy system. That all the angles 
marked with the symbol • are equal to A follows from the simple geometry of 
right-angled triangles and crossing lines. 

We now determine the coordinates of P in terms of lengths in the figure, 
expressing those lengths in terms of both sets of coordinates: 

(i) cos B = x' = TN + NP = MR + NP 

= ORsinA + RP cos ,4 = sinfi4 + B)sinA + cosfi4 + R)cosA; 

(ii) sin B=y' = OM-TM = OM-NR 

= OR cos A — RP sin A = sinfi4 + B ) cos A — cos(A + B ) sin A. 

Now, if equation (i) is multiplied by sin ,4 and added to equation (ii) multiplied 
by cos .4, the result is 

sin A cos B + cos A sin B = sinfi4 + R)(sin 2 A + cos 2 .4) = sin(A + B). 

Similarly, if equation (ii) is multiplied by sinff and subtracted from equation (i) 
multiplied by cos A, the result is 

cosAcosR — sinAsinR = cos(A + R)(cos 2 /4 + sin 2 A) = cos(A + B ). 

Corresponding graphically based results can be derived for the sines and cosines 
of the difference of two angles ; however, they are more easily obtained by setting 
B to —B in the previous results and remembering that sinB becomes — sinR 
whilst cos B is unchanged. The four results may be summarised by 

sin(A + B) = sinA cosB + cos ,4 sinR (1.18) 

cosfi4 + B) = cosAcosB + sin.4 sinR. (1.19) 


Standard results can be deduced from these by setting one of the two angles 
equal to n or to n/2: 

sin(7i — 6) — sin 6, cos(7i — 6) = — cos0, sin (Iji — 0) (1-20) 

= cos 8, cos (^7i — 6) = sin 6, (1.21) 


From these basic results many more can be derived. An immediate deduction, 
obtained by taking the ratio of the two equations (1.18) and (1.19) and then 
dividing both the numerator and denominator of this ratio by cos A cos B, is 


tan (A + B) 


tan A + tanR 
1 + tan A tan B ' 


( 1 . 22 ) 


One application of this result is a test for whether two lines on a graph 
are orthogonal (perpendicular); more generally, it determines the angle between 
them. The standard notation for a straight-line graph is y = mx + c, in which m 
is the slope of the graph and c is its intercept on the y-axis. It should be noted 
that the slope m is also the tangent of the angle the line makes with the x-axis. 
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Consequently the angle 8 n between two such straight-line graphs is equal to the 
difference in the angles they individually make with the x-axis, and the tangent 
of that angle is given by (1.22): 


tan 0 12 


tandi — tan 0 2 
1 + tan Oi tan 82 


m 1 — m 2 
1 + mi m 2 


(1.23) 


For the lines to be orthogonal we must have 0 12 = n/ 2, i.e. the final fraction on 
the RHS of the above equation must equal 00, and so 


m 1 m 2 = — 1. 


(1.24) 


A kind of inversion of equations (1.18) and (1.19) enables the sum or difference 
of two sines or cosines to be expressed as the product of two sinusoids; the 
procedure is typified by the following. Adding together the expressions given by 
(1.18) for sin(A + B) and sin(A — B ) yields 

sin(A + B) + sin(A — B) = 2 sin A cos B. 

If we now write A + B = C and A — B = D, this becomes 


■ ^ ■ n n • l C + D \ (C ~ D 
sin C + sin D = 2 sin | — - — ) cos 


2 / V 2 

In a similar way each of the following equations can be derived : 

'C+D 


sin C — sin D = 2 cos 


sin 


' C + D 

cos C + cos D = 2 cos ( — - — | cos 


C + D 

cos C — cos D = —2 sin ( — - — l sin 


(1.25) 


C-D\ 

(1.26) 

2 )’ 

' C-D\ 

(1.27) 

.2 r 

(C -D\ 


(1.28) 


The minus sign on the right of the last of these equations should be noted ; it may 
help to avoid overlooking this ‘oddity’ to recall that if C > D then cosC < cos D. 


1.2.3 Double- and half-angle identities 

Double-angle and half-angle identities are needed so often in practical calculations 
that they should be committed to memory by any physical scientist. They can be 
obtained by setting B equal to A in results (1.18) and (1.19). When this is done, 
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and use made of equation (1.15), the following results are obtained: 


sin 20 = 2 sin 0 cos 0, 
cos 20 = cos 2 6 — sin 2 0 
= 2 cos 2 0 — 1 

= 1—2 sin 2 0, 

? 

tan 20 = 


2tan0 


(1.29) 

(1.30) 

. t 2fl - (1.31) 

1 — tarn 0 

A further set of identities enables sinusoidal functions of 0 to be expressed 
as polynomial functions of a variable t = tan(0/2). They are not used in their 
primary role until the next chapter, but we give a derivation of them here for 
reference. 

If t = tan(0/2), then it follows from (1.16) that 1+t 2 = sec 2 (0/2) and cos(0/2) = 
(1 + t 2 ) -1 / 2 , whilst sin(0/2) = f(l + t 2 ) -1 / 2 . Now, using (1.29) and (1.30), we may 
write : 


. . . 0 0 2 1 

sin 0 = 2 sin - cos - = 

2 2 1 + f 2 


a 2 e 

cos 0 = cos - 


. 2 0 1 

Sin 2 - T 


■ t 2 ’ 


tan 0 = 


It 


1 


(1.32) 

(1.33) 

(1.34) 


It can be further shown that the derivative of 0 with respect to t takes the 
algebraic form 2/(1 + t 2 ). This completes a package of results that enables 
expressions involving sinusoids, particularly when they appear as integrands, to 
be cast in more convenient algebraic forms. The proof of the derivative property 
and examples of use of the above results are given in subsection (2.2.7). 

We conclude this section with a worked example which is of such a commonly 
occurring form that it might be considered a standard procedure. 


► So/re for 9 the equation 

asin0 + h cos 6 = k, 

where a , b and k are given real quantities. 

To solve this equation we make use of result (1.18) by setting a = K cos cj> and b = K sin cj> 
for suitable values of K and r/j. We then have 


with 


k = K cos f sin 0 + K sin ([> cos 9 = K sin(0 + cj>), 
K 2 = a 2 + b 2 and f = tan -1 -. 


Whether lies in 0 < f < n or in —n < <j> < 0 has to be determined by the individual 
signs of a and b. The solution is thus 

e = Sin " (£)-** 
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with K and r/j as given above. Notice that there is no real solution to the original equation 
if \k\ > \K\ = (a 2 + b 2 ) 1/2 . ◄ 


1.3 Coordinate geometry 

We have already mentioned the standard form for a straight-line graph, namely 

y = mx + c, (1.35) 

representing a linear relationship between the independent variable x and the 
dependent variable y. The slope m is equal to the tangent of the angle the line 
makes with the x-axis whilst c is the intercept on the y-axis. 

An alternative form for the equation of a straight line is 

ax + by + k = 0, ( 1.36) 

to which (1.35) is clearly connected by 

a , k 

m = — - and c = — -. 

b b 

This form treats x and y on a more symmetrical basis, the intercepts on the two 
axes being —k/a and — k/b respectively. 

A power relationship between two variables, i.e. one of the form y = Ax' 1 , can 
also be cast into straight-line form by taking the logarithms of both sides. Whilst 
it is normal in mathematical work to use natural logarithms (to base e, written 
lnx), for practical investigations logarithms to base 10 are often employed. In 
either case the form is the same, but it needs to be remembered which has been 
used when recovering the value of A from fitted data. In the mathematical (base 
e) form, the power relationship becomes 

lny = nlnx + In A (1-37) 

Now the slope gives the power n, whilst the intercept on the In y axis is In A, 
which yields A, either by exponentiation or by taking antilogarithms. 

The other standard coordinate forms of two-dimensional curves that students 
should know and recognise are those concerned with the conic sections - so called 
because they can all be obtained by taking suitable sections across a (double) 
cone. Because the conic sections can take many different orientations and scalings 
their general form is complex, 

Ax 2 + By 2 + Cxy + Dx + Ey + F = 0, (1.38) 

but each can be represented by one of four generic forms, an ellipse, a parabola, a 
hyperbola or, the degenerate form, a pair of straight lines. If they are reduced to 
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their standard representations, in which axes of symmetry are made to coincide 
with the coordinate axes, the first three take the forms 


-a) 2 . (V -Z?) 2 

a 2 b 2 

(ellipse), 

(1.39) 

(y — fl) 2 = 4a(x — a) 

(parabola), 

(1.40) 

-a) 2 (y-/?) 2 

a 2 b 2 

(hyperbola). 

(1.41) 


Here, (a, /?) gives the position of the ‘centre’ of the curve, usually taken as 
the origin (0,0) when this does not conflict with any imposed conditions. The 
parabola equation given is that for a curve symmetric about a line parallel to 
the x-axis. For one symmetrical about a parallel to the y-axis the equation would 
read (x — a) 2 = 4a(y — p). 

Of course, the circle is the special case of an ellipse in which b = a and the 
equation takes the form 

(x-a ) 2 +(y- p) 2 = a 2 . (1.42) 

The distinguishing characteristic of this equation is that when it is expressed in 
the form (1.38) the coefficients of x 2 and y 2 are equal and that of xy is zero; this 
property is not changed by any reorientation or scaling and so acts to identify a 
general conic as a circle. 

Definitions of the conic sections in terms of geometrical properties are also 
available; for example, a parabola can be defined as the locus of a point that 
is always at the same distance from a given straight line (the directrix) as it is 
from a given point (the focus). When these properties are expressed in Cartesian 
coordinates the above equations are obtained. For a circle, the defining property 
is that all points on the curve are a distance a from (a, /?) ; (1.42) expresses this 
requirement very directly. In the following worked example we derive the equation 
for a parabola. 

►Find the equation of a parabola that has the line x = —a as its directrix and the point 
(a, 0) as its focus. 


Figure 1.3 shows the situation in Cartesian coordinates. Expressing the defining requirement 
that PN and PF are equal in length gives 

(x + a) = [(x — a) 2 + y 2 ] 1/2 => (x + a) 2 = (x — a) 2 + y 1 

which, on expansion of the squared terms, immediately gives y 2 = 4ax. This is (1.40) with 
ct. and ft both set equal to zero. ◄ 

Although the algebra is more complicated, the same method can be used to 
derive the equations for the ellipse and the hyperbola. In these cases the distance 
from the fixed point is a definite fraction, e, known as the eccentricity, of the 
distance from the fixed line. For an ellipse 0 < e < 1, for a circle e — 0, and for a 
hyperbola e > 1. The parabola corresponds to the case e = 1. 
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y 



Figure 1.3 Construction of a parabola using the point (a, 0) as the focus and 
the line x = —a as the directrix. 


The values of a and b (with a > b) in equation (1.39) for an ellipse are related 
to e through 


e 


2 


a 2 — b 1 


and give the lengths of the semi-axes of the ellipse. If the ellipse is centred on 
the origin, i.e. a = p = 0, then the focus is (— ae, 0) and the directrix is the line 
x = —ale. 

For each conic section curve, although we have two variables, x and y, they are 
not independent, since if one is given then the other can be determined. However, 
determining y when x is given, say, involves solving a quadratic equation on each 
occasion, and so it is convenient to have parametric representations of the curves. 
A parametric representation allows each point on a curve to be associated with 
a unique value of a single parameter t. The simplest parametric representations 
for the conic sections are as given below, though that for the hyperbola uses 
hyperbolic functions, not formally introduced until chapter 3. That they do give 
valid parameterizations can be verified by substituting them into the standard 
forms (1.39) - (1.41); in each case the standard form is reduced to an algebraic 
or trigonometric identity. 


x = a + a cos (f>, 
x = a + at 2 , 
x = a + a cosh cf), 


y = P + b sin </> 
y = P + 2at 
y — P + b sinh (f> 


(ellipse), 

(parabola), 

(hyperbola). 


As a final example illustrating several topics from this section we now prove 
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the well-known result that the angle subtended by a diameter at any point on a 
circle is a right angle. 


► Taking the diameter to be the line joining Q = (— a, 0) and R = (a, 0) and the point P to 
be any point on the circle x 2 + y 2 = a 2 , prove that angle QPR is a right angle. 


If P is the point (x, y), the slope of the line QP is 

y-o y 

»h = 7 , = — • 

x — ( — a ) x + a 


That of RP is 


m2 


y-o 

x — (a) 


y 


x — a 


Thus 


mi m2 = 


„ 2 ' 


But, since P is on the circle, y 2 = a 2 —x 2 and consequently m\m .2 = — 1. From result (1.24) 
this implies that QP and RP are orthogonal and that QPR is therefore a right angle. Note 
that this is true for any point P on the circle. ◄ 


1.4 Partial fractions 


In subsequent chapters, and in particular when we come to study integration 
in chapter 2, we will need to express a function f(x) that is the ratio of two 
polynomials in a more manageable form. To remove some potential complexity 
from our discussion we will assume that all the coefficients in the polynomials 
are real, although this is not an essential simplification. 

The behaviour of f(x) is crucially determined by the location of the zeroes of 
its denominator, i.e. if f(x) is written as f(x) = g(x)/h(x) where both g(.\) and 
h(x) are polynomials!, then f(x) changes extremely rapidly when x is close to 
those values a, that are the roots of h(x) = 0. To make such behaviour explicit, 
we write f(x) as a sum of terms such as A/(x — a)' ! , in which A is a constant, a is 
one of the a,- that satisfy /i(a,) = 0 and n is a positive integer. Writing a function 
in this way is known as expressing it in partial fractions. 

Suppose, for the sake of definiteness, that we wish to express the function 


m = 


4x + 2 
x 2 + 3x + 2 


f It is assumed that the ratio has been reduced so that g(x) and h(x) do not contain any common 
factors, i.e. there is no value of x that makes both vanish at the same time. We may also assume 
without any loss of generality that the coefficient of the highest power of x in h(x) has been made 
equal to unity, if necessary, by dividing both numerator and denominator by the coefficient of this 
highest power. 
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in partial fractions, i.e. to write it as 


m = 


gU) 

h(x) 


4.x + 2 
x 2 + 3.x + 2 


A 1 

(.x — OCi ) ni 


A-2 

(x — a 2 )” 2 


(1.43) 


The first question that arises is that of how many terms there should be on 
the right-hand side (RHS). Although some complications occur when h(x) has 
repeated roots (these are considered below) it is clear that /(.x) only becomes 
infinite at the two values of x, oq and a 2 , that make h(x) = 0. Consequently the 
RHS can only become infinite at the same two values of x and therefore contains 
only two partial fractions - these are the ones shown explicitly. This argument 
can be trivially extended (again temporarily ignoring the possibility of repeated 
roots of h(x)) to show that if h(x) is a polynomial of degree n then there should be 
n terms on the RHS, each containing a different root a,- of the equation /i(oc,) = 0. 

A second general question concerns the appropriate values of the This is 
answered by putting the RHS over a common denominator, which will clearly 
have to be the product (x — oq )"' (x — a 2 )" 2 • • • . Comparison of the highest power 

of x in this new RHS with the same power in h(x) shows that «i + « 2 H = n. 

This result holds whether or not h(x) = 0 has repeated roots and, although we 
do not give a rigorous proof, strongly suggests the correct conclusions that: 


• The number of terms on the RHS is equal to the number of distinct roots of 
h(x) = 0, each term having a different root a,- in its denominator (x — a,)" 1 ' ; 

• If a,- is a multiple root of h(x) = 0 then the value to be assigned to n, in (1.43) is 
that of m, when h(x) is written in the product form (1.9). Further, as discussed 
on p. 23, Aj has to be replaced by a polynomial of degree m, — l.This is also 
formally true for non-repeated roots, since then both m, and n, are equal to 
unity. 


Returning to our specific example we note that the denominator h(x) has zeroes 
at x = ai — —l and x = a 2 = — 2; these x-values are the simple (non-repeated) 
roots of h(x) = 0. Thus the partial fraction expansion will be of the form 


4x T 2 A i Ai 

x 2 + 3x + 2 x + 1 x T 2 


(1.44) 


We now list several methods available for determining the coefficients A\ and 
d 2 . We also remind the reader that, as with all the explicit examples and techniques 
described, these methods are to be considered as models for the handling of any 
ratio of polynomials, with or without characteristics which makes it a special 
case. 


(i) The RHS can be put over a common denominator, in this case (x+l)(x+2), 

and then the coefficients of the various powers of x can be equated in the 
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numerators on both sides of the equation. This leads to 

4x T 2 — A\(x T 2) T zFjx -f 1), 

4 = A i T A.2 2 — 2^4 1 T zF. 

Solving the simultaneous equations for A i and A 2 gives A\ = —2 and 
A 2 = 6. 

(ii) A second method is to substitute two (or more generally n) different 
values of x into each side of (1.44) and so obtain two (or n) simultaneous 
equations for the two (or n) constants A t . To justify this practical way of 
proceeding it is necessary, strictly speaking, to appeal to method (i) above, 
which establishes that there are unique values for A 1 and A 2 valid for 
all values of x. It is normally very convenient to take zero as one of the 
values of x, but of course any set will do. Suppose in the present case that 
we use the values x = 0 and x = 1 and substitute in (1.44). The resulting 
equations are 

2_A 1 A^ 

2 “ T + T’ 

6 M A 2 
6 2 + 3 ’ 

which on solution give A\ = —2 and A 2 = 6, as before. The reader can 
easily verify that any other pair of values for x (except for a pair that 
includes cq or a 2 ) gives the same values for A\ and A 2 . 

(iii) The very reason why method (ii) fails if x is chosen as one of the roots 
a,- of h(x) = 0 can be made the basis for determining the values of the A,- 
corresponding to non-multiple roots without having to solve simultaneous 
equations. The method is conceptually more difficult than the other meth- 
ods presented here, and needs results from the theory of complex variables 
(chapter 20) to justify it. However, we give a practical ‘cookbook’ recipe 
for determining the coefficients. 

(a) To determine the coefficient Ak, imagine the denominator h(x) 
written as the product (x — ai)(x — 1 x 2 ) • • • (x — a„), with any m-fold 
repeated root giving rise to m factors in parentheses. 

(b) Now set x equal to a k and evaluate the expression obtained after 
omitting the factor that reads a/t — a k- 

(c) Divide the value so obtained into g(otjt); the result is the required 
coefficient Ak- 

For our specific example we find that in step (a) that h(x) = (x + l)(x + 2) 
and that in evaluating A\ step (b) yields — 1 + 2 = 1. Since g(— 1) = 
4(— 1) + 2 = —2, step (c) gives A\ as ( — 2) /( 1), i.e in agreement with our 
other evaluations. In a similar way A 2 is evaluated as (— 6)/(— 1) = 6. 
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Thus any one of the methods listed above shows that 

4.x + 2 —2 6 

x 2 + 3x + 2 x + 1 x T 2 

The best method to use in any particular circumstance will depend on the 
complexity, in terms of the degrees of the polynomials and the multiplicities of 
the roots of the denominator, of the function being considered and, to some 
extent, on the individual inclinations of the student; some prefer lengthy but 
straightforward solution of simultaneous equations, whilst others feel more at 
home carrying shorter but more abstract calculations in their heads. 

1.4.1 Complications and special cases 

Having established the basic method for partial fractions, we now show, through 
further worked examples, how some complications are dealt with by extensions 
to the procedure. These extensions are introduced one at a time, but of course in 
any practical application more than one may be involved. 

The degree of the numerator is greater than or equal to that of the denominator 

Although we have not specifically mentioned the fact, it will be apparent from 
trying to apply method (i) of the previous subsection to such a case, that if the 
degree of the numerator (m) is not less than that of the denominator (n) then the 
ratio of two polynomials cannot be expressed in partial fractions. 

To get round this difficulty it is necessary to start by dividing the denominator 
h(x) into the numerator g(x) to obtain a further polynomial, which we will denote 
by s(x), together with a function f(x) that is a ratio of two polynomials for which 
the degree of the numerator is less than that of the denominator. The function 
f(x) can therefore be expanded in partial fractions. As a formula, 

fix) = = s(x) + f(x) = s(x) + (1.45) 

It is apparent that the polynomial r(x) is the remainder obtained when g(x) is 
divided by h(x), and, in general, will be a polynomial of degree n — 1. It is also 
clear that the polynomial s(x) will be of degree m — n. Again, the actual division 
process can be set out as an algebraic long division sum but is probably more 
easily handled by writing (1.45) in the form 

g(x) = s(x)h(x) + r(x) ( 1.46) 

or, more explicitly, as 

g(x) = (s m _„x m ~"+ s m - n -ix m ~ n ~ 1 4 f So)h(x) + (r„_ix" -1 + r„_ 2 x" -2 H h r 0 ) 

(1.47) 

and then equating coefficients. 
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We illustrate this procedure with the following worked example. 


►find the partial fraction decomposition of the function 


fix) 


x 3 + 3x 2 + 2x + 1 
x 2 — x — 6 


Since the degree of the numerator is 3 and that of the denominator is 2, a preliminary 
long division is necessary. The polynomial s(x) resulting from the division will have degree 
3 — 2 = 1 and the remainder r(x) will be of degree 2 — 1 = 1 (or less). Thus we write 

x 3 + 3x 2 + 2x + 1 = (s ix + so)(x 2 — x — 6) + (ri.x + ro). 

From equating the coefficients of the various powers of x on the two sides of the equation, 
starting with the highest, we now obtain the simultaneous equations 

1 = Sl, 

3 = so — si, 

2 = —so — 6si + n, 

1 = — 6so + r 0 . 


These are readily solved, in the given order, to yield S! = 1, so = 4, r \ = 12 and r 0 = 25. 
Thus f(x) can be written as 


f(x) = x + 4 + 


12.x + 25 
x 2 — x — 6 


The last term can now be decomposed into partial fractions as previously. The zeroes of 
the denominator are at x = 3 and x = —2 and the application of any method from the 
previous subsection yields the respective constants as A\ = 12^ and A 2 = — i. Thus the 
final partial fraction decomposition of / (.x) is 

„ 61 1 
v 4- A -I 4 

5(x-3) 5(x + 2)’ 


Factors of the form a 2 + x 2 in the denominator 
We have so far assumed that the roots of h(x) = 0, needed for the factorisation of 
the denominator of f(x), can always be found. In principle they always can but 
in some cases they are not real. Consider, for example, attempting to express in 
partial fractions a polynomial ratio whose denominator is /fix) = x 3 — x 2 + 2.x — 2. 
Clearly x = 1 gives a zero of h(x), and so a first factorisation is (x — l)(.x 2 + 2). 
However we cannot make any further progress because the factor x 2 + 2 cannot 
be expressed as (x — a)(x — f}) for any real a and /l. 

Complex numbers are introduced later in this book (chapter 3) and, when the 
reader has studied them, he or she may wish to justify the procedure set out 
below. It can be shown to be equivalent to that already given, but the zeroes of 
h(x) are now allowed to be complex and terms that are complex conjugates of 
each other are combined to leave only real terms. 

Since quadratic factors of the form a 2 +.x 2 that appear in h(x) cannot be reduced 
to the product of two linear factors, partial fraction expansions including them 
need to have numerators in the corresponding terms that are not simply constants 
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Aj but linear functions of x, i.e. of the form B,x + C,. Thus, in the expansion, 
linear terms (first-degree polynomials) in the denominator have constants (zero- 
degree polynomials) in their numerators, whilst quadratic terms (second-degree 
polynomials) in the denominator have linear terms (first-degree polynomials) in 
their numerators. As a symbolic formula, the partial fraction expansion of 

gU) 

(x — ai)(x — a 2 ) • • • (x — a p )(x 2 + aj)(x 2 + a j) ■ ■ ■ (x 2 + a 2 ) 

should take the form 

A\ A 2 A p B\X + Cl B 2 X + Ci B q x + C q 

x — ai x — ct 2 x — ol v x 2 + a\ x 2 + a 2 x 2 + a 2 

Of course, the degree of g(x) must be less than p + 2q\ if it is not, an initial 
division must be carried out as demonstrated earlier. 


Repeated factors in the denominator 
Consider trying (incorrectly) to expand 

J{ ’ (x + l)(x — 2) 2 
in partial fraction form as follows: 

x — 4 A\ A 2 

(x + 1 )(x — 2) 2 x+l + (x — 2) 2 ’ 

Multiplying both sides of this supposed equality by (x + l)(x — 2) 2 produces an 
equation whose LHS is linear in x, whilst its RHS is quadratic. This is clearly 
wrong and so an expansion in the above form cannot be valid. The correction we 
must make is very similar to that needed in the previous subsection, namely that 
since (x — 2) 2 is a quadratic polynomial the numerator of the term containing it 
must be a first-degree polynomial, and not simply a constant. 

The correct form for the part of the expansion containing the doubly repeated 
root is therefore (Bx + C)/(x — 2) 2 . Using this form and either of methods (i) and 
(ii) for determining the constants gives the full partial fraction expansion as 
x — 4 5 5x — 16 

(x + l)(x - 2) 2 = _ 9(.x + 1) + 9(x — 2) 2 ’ 
as the reader may verify. 

Since any term of the form {Bx + C)/(x — a) 2 can be written as 


B(x — ol) + C + Ba. B C + Bx 
(x — a) 2 x — a (x — a) 2 ’ 

and similarly for multiply repeated roots, an alternative form for the part of the 
partial fraction expansion containing a repeated root a is 


D 1 D 2 

x — a (x — a) 2 


Dp 

(x — a )p 


(1.48) 
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In this form, all x-dependence has disappeared from the numerators but at the 
expense of p— 1 additional terms; the total number of constants to be determined 
remains unchanged, as it must. 

When describing possible methods of determining the constants in a partial 
fraction expansion, we noted that method (iii), p. 20, which avoids the need to 
solve simultaneous equations, is restricted to terms involving non-repeated roots. 
In fact, it can be applied in repeated-root situations, when the expansion is put 
in the form (1.48), but only to find the constant in the term involving the largest 
inverse power of x — a, i.e. D p in (1.48). 

We conclude this section with a more protracted worked example that contains 
all three of the complications discussed. 


► Resolve the following expression F{x) into partial fractions: 


F(x) = 


x 5 - 2x 4 - x 3 + 5x 2 - 46x + 100 
(x 2 + 6)(x — 2) 2 


We note that the degree of the denominator (4) is not greater than that of the numerator 
(5), and so we must start by dividing the latter by the former. It follows, from the difference 
in degrees and the coefficients of the highest powers in each, that the result will be a linear 
expression sjx + s o with the coefficient si equal to 1. Thus the numerator of F(x) must be 
expressible as 


(x + so)(x 4 — 4.x 3 + 10x 2 — 24x + 24) + {r 2 x 3 + r 2 .x 2 + rix + ro ), 

where the second factor in parentheses is the denominator of F(x) written as a polynomial. 
Equating the coefficients of x 4 gives —2 = — 4+so and fixes so as 2. Equating the coefficients 
of powers less than 4 gives equations involving the coefficients r, as follows : 


—1 = —8 + 10 + f ' 3 , 

5 = -24 + 20 + r 2 , 
-46 = 24 - 48 + n, 
100 = 48 + ro. 


Thus the remainder polynomial r(x) can be constructed and F(x) written as 


, — 3x 3 + 9.x 2 - 22x + 52 

F(x) = x + 2 + (y2 + 6)(x _ 2)2 "* + 2 + /(*)■ 


The polynomial ratio f(x) can now be expressed in partial fraction form, noting that its 
denominator contains both a term of the form x 2 + a 2 and a repeated root. Thus 


f(x) 


Bx T F D i D 2 

x 2 + 6 + x — 2 (x — 2) 2 


We could now put the RHS of this equation over the common denominator (x 2 + 6)(x— 2) 2 
and find B,C,Di and D 2 by equating coefficients of powers of x. It is quicker, however, 
to use methods (iii) and (ii). Method (iii) gives D 2 as (—24 + 36 — 44 + 52)/ (4 + 6) = 2. 
We choose to evaluate the other coefficients by method (ii), and setting x = 0, x = 1 and 
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x = — 1 gives respectively 


These equations reduce to 


52 

24 

36 

y 

86 

63 


C D i 2 
~6 ~ T + 4’ 


B + C 


~D i + 


2 , 


C-B £>! 2 

+ 

7 3 9 


4C - 12 Di = 40, 
B + C - 1D X = 22, 
-9B+9C-21D! = 72, 


with solution B = 0, C = 1, Di = —3. 

Thus, finally, we may re-write the original expression F(x ) in partial fractions as 


F(x) = x + 2 + 


1 

x 2 + 6 


3 

x — 2 


+ 


2 

(^ 2 ?' ^ 


1.5 Binomial expansion 

Earlier in this chapter we were led to consider functions containing powers of 
the sum or difference of two terms, e.g. (x — oc) m . Later in this book we will find 
numerous occasions on which we wish to write such a product of repeated factors 
as a polynomial in x or, more generally, as a sum of terms each of which contains 
powers of x and a separately, as opposed to a power of their sum or difference. 

To make the discussion general and the result applicable to a wide variety of 
situations, we will consider the general expansion of /(x) = (x + y) n , where x and 
y may stand for constants, variables or functions and, for the time being, n is a 
positive integer. It may not be obvious what form the general expansion takes 
but some idea can be obtained by carrying out the multiplication explicitly for 
small values of n. Thus we obtain successively 

(x + y) 1 = x + y, 

(x + y) 2 = (x + y)(x + y) = x 2 + 2xy + y 2 , 

(x + y) 3 = (x + y)(x 2 + 2xy + y 2 ) = x 3 + 3x 2 y + 3xy 2 + y 3 , 

(x + y) 4 = (x + y)(x 3 + 3x 2 y + 3xy 2 + y 3 ) = x 4 + 4x 3 y + 6x 2 y 2 + 4xy 3 + y 4 . 

This does not establish a general formula, but the regularity of the terms in 
the expansions and the suggestion of a pattern in the coefficients indicate that a 
general formula for power n will have n + 1 terms, that the powers of x and y in 
every term will add up to n and that the coefficients of the first and last terms 
will be unity whilst those of the second and penultimate terms will be n. 
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In fact, the general expression, the binomial expansion for power n, is given by 

k=n 

(x + y) n = "CkX n ~ k y k , (1.49) 

k = 0 

where " C/< is called the binomial coefficient and is expressed in terms of factorial 
functions by n\/[k\(n — k)\]. Clearly, simply to make such a statement does not 
constitute proof of its validity, but, as we will see in subsection 1.5.2, (1.49) can 
be proved using a method called induction. Before turning to that proof, we 
investigate some of the elementary properties of the binomial coefficients. 


1.5.1 Binomial coefficients 

As stated above, the binomial coefficients are defined by 

= kH^W. = 0 for 0 < A < ». (1-50) 

where in the second identity we give a common alternative notation for n C k . 
Obvious properties include 

(i) "C 0 = n C n = 1, 

(ii) "Ci = "C n _! = n, 

(hi) n C k = 


We note that, for any given n, the largest coefficient in the binomial expansion is 
the middle one (k = n/2) if n is even; the middle two coffficients ( k = |( n +1)) 
are equal largest if n is odd. Somewhat less obvious is the result 

fl ! yi J 

C k + C k -i = kl(n _ k)] + (fc_i)!(„_fc + i)! 

nl[(n + 1 — k) + k\ 


k\(n + 1 —k)\ 
(»+D! _ »+i 


= " +1 C,. 


k\(n + 1 — k)\ 

An equivalent statement, in which k has been redefined as k + 1, is 


(1.51) 


nr' i nr' n+l r 1 
^ k~T Mc+l = L'k+l- 


(1.52) 


1.5.2 Proof of the binomial expansion 

We are now in a position to prove the binomial expansion (1.49). In doing so, we 
introduce the reader to a procedure applicable to certain types of problems and 
known as the method of induction. The method is discussed much more fully in 
subsection 1.7.1. 

We start by assuming that( 1.49) is true for some positive integer n — N. We 
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now proceed to show that this implies that it must also be true for n = N + 1, as 
follows : 


(x + y) JV+1 = (x + y) Y N C k x N - k y k 


k = 0 


N 


= £ N C k x N+1 ~ k y k + Y, N C k x N ~ k y k+1 
k = 0 k = 0 

N N+l 

= Y N C k x N+l ~ k y k + Y "Cj-ixP+^yJ, 

k = 0 j = 1 


where in the first line we have used the assumption and in the third line have 
moved the second summation index, by unity by writing k + 1 = j. We now 
separate off the first term of the first sum, n Cqx n+1 , and write it as N+l Cox N+l ; 
we can do this since, as noted in (i) following (1.50), "Co = 1 for every n. Similarly, 
the last term of the second summation can be replaced by JV+1 Cjv+iy N+1 - 

The remaining terms of each of the two summations are now written together, 
with the summation index denoted by k in both terms. Thus 


(X + y) N + l = n+1 Cqx n+1 + Y i N Ck + N C k -t) x (W+1 >-y + A,+1 Cjv+iy JV+1 

k=l 

= n+1 Cqx n+1 + Y N+1 C k x {N+1) ~ k y k + N+l C N+l y N+l 


k= 1 


AT+1 

= Y N+1 CkX (N+l) ~ k y k . 

k = 0 


In going from the first to the second line we have used result (1.51). Now we 
observe that the final overall equation is just the original assumed result (1.49) 
but with n = N + 1. Thus it has been shown that if the binomial expansion is 
assumed to be true for n = N, then it can be proved to be true for n = N + 1. But 
it holds trivially for n = 1, and therefore for n = 2 also. By the same token it is 
valid for n = 3,4,..., and hence is established for all positive integers n. 


1.6 Properties of binomial coefficients 

1.6.1 Identities involving binomial coefficients 

There are many identities involving the binomial coefficients that can be derived 
directly from their definition, and yet more that follow from their appearance in 
the binomial expansion. Only the most elementary ones, given earlier, are worth 
committing to memory but, as illustrations, we now derive two results involving 
sums of binomial coefficients. 


27 



PRELIMINARY ALGEBRA 


The first is a further application of the method of induction. Consider the 
proposal that, for any n > 1 and k > 0, 

n - 1 

J2 k+S C k = n+k C k+l . (1.53) 

s=0 

Notice that here n, the number of terms in the sum, is the parameter that varies, 
k is a fixed parameter, whilst s is a summation index and does not appear on the 
RHS of the equation. 

Now we suppose that this statement about the value of the sum of the binomial 
coefficients k C k , k+] C k , . . . , k+n ~ i C k is true for n = N. We next write down a series 
with an extra term and determine the implications of the supposition for the new 
series : 

JV+l-l JV-l 

k+s C k + k+N C k 

s=0 s=0 

= N+k c k+l + N+k c k 

N+k+l^ 

= Lfc+i. 

But this is just proposal (1.53) with n now set equal to N + 1. To obtain the last 
line, we have used (1.52), with n set equal to N + k. 

It only remains to consider the case n = 1, when the summation only contains 
one term and (1.53) reduces to 

k s-i 1 +&/'-’ 

k = Mc+1- 

This is trivially valid for any k since both sides are equal to unity, thus completing 
the proof of (1.53) for all positive integers n. 

The second result, which gives a formula for combining terms from two sets 
of binomial coefficients in a particular way (a kind of ‘convolution’, for readers 
who are already familiar with this term), is derived by applying the binomial 
expansion directly to the identity 

(x + y) p (x + yf = {x + yF+T 

Written in terms of binomial expansions, this reads 

p q p+q 

J2 P CsX p ~ s y s s ^ q C t x q - t y t = J2 P+q CrX p+q ~ r y r . 

s=0 t = 0 r = 0 

We now equate coefficients of x v+q ~ r y r on the two sides of the equation, noting 
that on the LHS all combinations of s and t such that s + t — r contribute. This 
gives as an identity that 

r r 

Y, P Cr-, q C t = p+q C r = Y p Ct q C r -t- (1.54) 

t=0 t=0 


E fe+s ct- = E 
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We have specifically included the second equality to emphasise the symmetrical 
nature of the relationship with respect to p and q. 

Further identities involving the coefficients can be obtained by giving x and y 
special values in the defining equation (1.49) for the expansion. If both are set 
equal to unity then we obtain (using the alternative notation so as to produce 
familiarity with it) 



whilst setting x = 1 and y = — 1 yields 



(1.55) 


(1.56) 


1.6.2 Negative and non-integral values of n 

Up till now we have restricted n in the binomial expansion to be a positive 
integer. Negative values can be accommodated, but only at the cost of an infinite 
series of terms rather than the finite one represented by (1.49). For reasons that 
are intuitively sensible and will be discussed in more detail in chapter 4, very 
often we require an expansion in which, at least ultimately, successive terms in 
the infinite series decrease in magnitude. For this reason, if x > y we consider 
(x + y)~ m , where m itself is a positive integer, in the form 

(x + y) n = (x + y)~ m = x~ m (l + £) . 

Since the ratio y/x is less than unity, terms containing higher powers of it will be 
small in magnitude, whilst raising the unit term to any power will not affect its 
magnitude. If y > x the roles of the two must be interchanged. 

We can now state, but will not explicitly prove, the form of the binomial 
expansion appropriate to negative values of n (n equal to —m ) : 

oo k 

(x + y) n = (x + y)~ m = x~ m ~ m C k ^ , (1.57) 

k = o x 

where the hitherto undefined quantity which appears to involve factorials 

of negative numbers, is given by 

-mn , i m ( m + 1) • • • (fW + fc — 1) , k (m + k— 1)! ^km+k-l^ 

Ck = ( ' * = M) 

The binomial coefficient on the extreme right of this equation has its normal 
meaning and is well defined since m + k — 1 > k. 

Thus we have a definition of binomial coefficients for negative integer values 
of n in terms of those for positive n. The connection between the two may not 


29 



PRELIMINARY ALGEBRA 


be obvious, but they are both formed in the same way in terms of recurrence 
relations. Whatever the sign of n, the series of coefficients "Q can be generated 
by starting with "Co = 1 and using the recurrence relation 

"Cfc+i = 7“TT "C k - (1.59) 

k + 1 

The difference is that for positive integer n the series terminates when k = n, 
whereas for negative n there is no such termination - in line with the infinite 
series of terms in the corresponding expansion. 

Finally we note that, in fact, equation (1.59) generates the appropriate coef- 
ficients for all values of n, positive or negative, integer or non-integer, with the 
obvious exception of the case in which .x = — y and n is negative. For non-integer 
n the expansion does not terminate, even if n is positive. 


1.7 Some particular methods of proof 

Much of the mathematics used by physicists and engineers is concerned with 
obtaining a particular value, formula or function from a given set of data and 
stated conditions. Flowever, just as it is essential in physics to formulate the basic 
laws and so be able to set boundaries on what can or cannot happen, so it 
is important in mathematics to be able to state general propositions about the 
outcomes that are or are not possible. To this end one attempts to establish 
theorems that state in as general a way as possible mathematical results that 
apply to particular types of situation. We conclude this introductory chapter by 
describing two methods that can sometimes be used to prove particular classes 
of theorems. 

The two general methods of proof are known as proof by induction (which 
has already been met in this chapter) and proof by contradiction. They share 
the common characteristic that at an early stage in the proof an assumption 
is made that a particular (unproven) statement is true; the consequences of 
that assumption are then explored. In an inductive proof the conclusion is 
reached that the assumption is self-consistent and has other equally consistent 
but broader implications, which are then applied to establish the general validity 
of the assumption. A proof by contradiction, however, establishes an internal 
inconsistency and thus shows that the assumption is unsustainable; the natural 
consequence of this is that the negative of the assumption is established as true. 

Later in this book use will be made of these methods of proof to explore new 
territory, e.g. to examine the properties of vector spaces, matrices and groups. 
Flowever, at this stage we will draw our illustrative and test examples from earlier 
sections of this chapter and other topics in elementary algebra and number theory. 
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1.7.1 Proof by induction 

The proof of the binomial expansion given in subsection 1.5.2 and the identity 
established in subsection 1.6.1 have already shown the way in which an inductive 
proof is carried through. They also indicated the main limitation of the method, 
namely that only an initially supposed result can be proved. Thus the method 
of induction is of no use for deducing a previously unknown result; a putative 
equation or result has to be arrived at by some other means, usually by noticing 
patterns or by trial and error using simple values of the variables involved. It 
will also be clear that propositions that can be proved by induction are limited 
to those containing a parameter that takes a range of integer values (usually 
infinite). 

For a proposition involving a parameter n, the five steps in a proof using 
induction are as follows. 

(i) Formulate the supposed result for general n. 

(ii) Suppose (i) to be true for n = N (or more generally for all values of 
n < N; see below), where N is restricted to lie in the stated range. 

(iii) Show, using only proven results and supposition (ii), that proposition (i) 
is true for n = N + 1. 

(iv) Demonstrate directly, and without any assumptions, that proposition (i) is 
true when n takes the lowest value in its range. 

(v) It then follows from (iii) and (iv) that the proposition is valid for all values 
of n in the stated range. 

(It should be noted that, although many proofs at stage (iii) require the validity 
of the proposition only for n = N, some require it for all n less than or equal to N 
- hence the form of inequality given in parentheses in the stage (ii) assumption.) 

To illustrate further the method of induction, we now apply it to two worked 
examples ; the first concerns the sum of the squares of the first n natural numbers. 

>-Prove that the sum of the squares of the first n natural numbers is given by 

n 

'y' r 2 = \n(n + l)(2n + 1). ( 1.60) 


As previously we start by assuming the result is true for n = N. Then it follows that 

JV+l N 

X/ 2 = ^r 2 + (lV + 1 ) 2 

r= 1 r = 1 

= lN(N + f(2N + l) + (N + l) 2 
= l(N + l)[N(2N + 1) + 61V + 6] 

= l(JV + l)[(2JV + 3)(JV + 2)] 

= 1(1V + 1)[(1V+1) + 1][2(JV + 1) + 1]. 
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This is precisely the original assumption, but with A replaced by IV + 1. To complete the 
proof we only have to verify (1.60) for n = 1. This is trivially done and establishes the 
result for all positive n. The same and related results are obtained by a different method 
in subsection 4.2.5. ◄ 

Our second example is somewhat more complex and involves two nested proofs 
by induction: whilst trying to establish the main result by induction, we find that 
we are faced with a second proposition which itself requires an inductive proof. 


► S/iow that Q{n) = n 4 + 2n 3 +2n 2 + n is divisible by 6 ( without remainder) for all positive 
integer values of n. 


Again we start by assuming the result is true for some particular value A of n, whilst 
noting that it is trivially true for n = 0. We next examine Q(N + 1), writing each of its 
terms as a binomial expansion: 

Q(N + 1) = (A + l) 4 + 2 (A + l) 3 + 2 (A + l) 2 + (A + 1) 

= (A 4 + 41V 3 + 6 N 2 + 41V + 1) + 2 (A 3 + 3A 2 + 3A + 1) 

+ 2(A 2 + 2A + 1) + (A + 1) 

= (A 4 + 2A 3 + 2A 2 + A) + (4A 3 + 12A 2 + 14A + 6). 

Now, by our assumption, the group of terms within the first parentheses in the last line 
is divisible by 6 and clearly so are the terms 12 A 2 and 6 within the second parentheses. 
Thus it conies down to deciding whether 4A 3 + 14A is divisible by 6 - or equivalently, 
whether R(N) = 2A 3 + 7A is divisible by 3. 

To settle this latter question we try using a second inductive proof and assume that 
R(N) is divisible by 3 for A = M, whilst again noting that the proposition is trivially true 
for A = M = 0. This time we examine R(M + 1): 

R(M + 1) = 2 (M + l) 3 + 7(M + 1) 

= 2 (M 3 + 3 M 2 + 3M + 1) + 7(M + 1) 

= (2 M 3 + 7M) + 3(2 M 2 + 2M + 3) 

By assumption, the first group of terms in the last line is divisible by 3 and the second 
group is patently so. We thus conclude that R(N) is divisible by 3 for all A > M, and 
taking M = 0 shows that it is divisible by 3 for all A. 

We can now return to the main proposition and conclude that since R(N) = 2A 3 + 7A 
is divisible by 3, 4A 3 + 12 A 2 + 14 A + 6 is divisible by 6. This in turn establishes that the 
divisibility of Q(N + 1 ) by 6 follows from the assumption that Q{ A) divides by 6. Since 
2(0) clearly divides by 6, the proposition in the question is established for all values of n. ◄ 


1.7.2 Proof by contradiction 

The second general line of proof, but again one that is normally only useful when 
the result is already suspected, is proof by contradiction. The questions it can 
attempt to answer are only those that can be expressed in a proposition that 
is either true or false. Clearly, it could be argued that any mathematical result 
can be so expressed but, if the proposition is no more than a guess, the chances 
of success are negligible. Valid propositions containing even modest formulae 
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are either the result of true inspiration or, much more normally, yet another 
reworking of an old chestnut! 

The essence of the method is to exploit the fact that mathematics is required 
to be self-consistent, so that, for example, two calculations of the same quantity, 
starting from the same given data but proceeding by different methods, must give 
the same answer. Equally, it must not be possible to follow a line of reasoning and 
draw a conclusion that contradicts either the input data or any other conclusion 
based upon the same data. 

It is this requirement on which the method of proof by contradiction is based. 
The crux of the method is to assume that the proposition to be proved is 
not true, and then use this incorrect assumption and ‘watertight’ reasoning to 
draw a conclusion that contradicts the assumption. The only way out of the 
self-contradiction is then to conclude that the assumption was indeed false and 
therefore that the proposition is true. 

It must be emphasised that once a (false) contrary assumption has been made, 
every subsequent conclusion in the argument must follow of necessity. Proof by 
contradiction fails if at any stage we have to admit ‘this may or may not be 
the case’. That is, each step in the argument must be a necessary consequence of 
results that precede it (taken together with the assumption), rather than simply a 
possible consequence. 

It should also be added that if no contradiction can be found using sound 
reasoning based on the assumption then no conclusion can be drawn about either 
the proposition or its negative and some other approach must be tried. 

We illustrate the general method with an example in which the mathematical 
reasoning is straightforward so that attention can be focussed on the structure of 
the proof. 


►d rational number r is a fraction r = p/q in which p and q are integers with q positive. 
Further, r is expressed in its lowest terms, any integer common factor of p and q having 
been divided out. 

Prove that the square root of an integer m cannot be a rational number, unless the square 
root itself is an integer. 


We begin by supposing that the stated result is not true and that we can write an equation 

Jm = r = - for integers m,p,q with q f 1. 

<? 

It then follows that p 2 = mq 2 . But, since r is expressed in its lowest terms, p and q , and 
hence p 2 and q 2 , have no factors in common whilst m is an integer. This is only possible 
if q = 1 and p 2 = m. This conclusion contradicts the requirement that q f= 1 and so leads 
to the conclusion that it was wrong to suppose that fm can be expressed as a non-integer 
rational number. This completes the proof of the statement in the question. ◄ 

Our second worked example, also taken from elementary number theory, 
involves slightly more complicated mathematical reasoning but again exhibits the 
structure associated with this type of proof. 
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► 77ie prime integers p, are labelled in ascending order, thus p i = \, pi = 2, p 5 = 7, etc. 
Show that there is no largest prime number. 


Assume, on the contrary, that there is a largest prime and let it be p N . Consider now the 
number q formed by multiplying together all the primes from pi to p N and then adding 
one to the product, i.e. 

q = PiP2---Pn + L 

By our assumption p N is the largest prime, and so no number can have a prime factor 
greater than this. However, for every prime p t (i = 1,2,..., IV) the quotient q/p t has the 
form Mj + (l/p,) with M, an integer and 1 /p,- non-integer. This means that q/p t cannot be 
an integer and so p ; cannot be a divisor of q. 

Since q is not divisible by any of the (assumed) finite set of primes, it must be itself 
a prime. As q is also clearly greater than p N , we have a contradiction. Thus it follows 
that our assumption that there is a largest prime integer must be false, and so it has been 
proved that there is no largest prime integer. 

It should be noted that the given construction for q does not generate all the primes 
that actually exist (e.g. for N = 3, q = 7 rather than the next actual prime value of 5, is 
found), but this does not matter for the purposes of our proof by contradiction. ◄ 


1.7.3 Necessary and sufficient conditions 

As the final topic in this introductory chapter, we consider briefly the notion 
of, and distinction between, necessary and sufficient conditions in the context 
of proving a mathematical proposition. In ordinary English the distinction is 
well defined, and that distinction is maintained in mathematics. However, in 
the authors’ experience students tend to overlook it and assume (wrongly) that, 
having proved that the validity of proposition A implies the truth of proposition 
B , it follows by ‘reversing the argument’ that the validity of B automatically 
implies that of A. 

As an example, let proposition A be that an integer N is divisible without 
remainder by 6, and proposition B be that N is divisible without remainder by 
2. Clearly, if A is true then it follows that B is true, i.e. A is a sufficient condition 
for B ; it is not however a necessary condition, as is trivially shown by taking N 
as 8. Conversely, the same value of N shows that whilst the validity of B is a 
necessary condition for A to hold, it is not sufficient. 

An alternative terminology to ‘necessary’ and ‘sufficient’ often employed by 
mathematicians is that of ‘if’ and ‘only if’, particularly in the combination ‘if and 
only if’ which is usually written as IFF or denoted by a double-headed arrow 
■<=> . The equivalent statements can be summarised by 


A if B 

A is true if B is true or 

B is a sufficient condition for A 

B ==> A, 
B ==> A, 

A only if B 

A is true only if B is true or 

B is a necessary consequence of A 

A=>B, 
A => B, 
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A IFF B A is true if and only if B is true or B <=> A, 

A and B necessarily imply each other B <=> A. 

Although at this stage in the book we are able to employ for illustrative purposes 
only simple and fairly obvious results, the following example is given as a model 
of how necessary and sufficient conditions should be proved. The essential point 
is that for the second part of the proof (whether it be the ‘necessary’ part or the 
‘sufficient’ part) one needs to start again from scratch; more often than not, the 
lines of the second part of the proof will not be simply those of the first written 
in reverse order. 


>-Prove that (A) a function f(x) is a quadratic polynomial with zeroes at x = 2 and x = 3 
if and only if (B) the function f(x) has the form X(x 2 — 5x + 6) with A a non-zero constant. 


(1) Assume A, i.e. that f(x) is a quadratic polynomial with zeroes at x = 2 and x = 3. Let 
its form be ax 2 + bx + c with a f 0. Then we have 

4a + 2b + c = 0, 

9a + 3b + c = 0, 

and subtraction shows that 5a + b = 0 and h = —5a. Substitution of this into the first of 
the above equations gives c = —4a — 2h = —4a + lOu = 6a. Thus, it follows that 

f(x) = a(x 2 — 5x + 6) with af 0, 

and establishes the ‘ A only if B' part of the stated result. 

(2) Now assume that fix) has the form A(x 2 — 5x + 6) with X a non-zero constant. Firstly 
we note that fix) is a quadratic polynomial, and so it only remains to prove that its 
zeroes occur at x = 2 and x = 3. Consider / (x) = 0, which, after dividing through by the 
non-zero constant X , gives 

x 2 — 5.x + 6 = 0. 

We proceed by using a technique known as completing the square, for the purposes of 
illustration, although the factorisation of the above equation should be clear to the reader. 
Thus we write 

x 2 — 5.x + ( | ) 2 — ( | ) 2 + 6 = 0, 


The two roots of fix) = 0 are therefore x = 2 and x = 3; these x-values give the zeroes 
of f(x). This establishes the second (A if B') part of the result. Thus we have shown 
that the assumption of either condition implies the validity of the other and the proof is 
complete. ◄ 

It should be noted that the propositions have to be carefully and precisely 
formulated. If, for example, the word ‘quadratic’ were omitted from A, statement 
B would still be a sufficient condition for A but not a necessary one, since f(x) 
could then be .x 3 — 4x 2 +x + 6 and A would not require B. Omitting the constant 
X from the stated form of f(x) in B has the same effect. Conversely, if A were to 
state that f(x) = 3(x — 2)(x — 3) then B would be a necessary condition for A but 
not a sufficient one. 
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1.8 Exercises 

Polynomial equations 

1.1 Continue the investigation of equation (1.7), namely 

g(x) = 4x 3 + 3.x 2 — 6x — 1, 

as follows. 

(a) Make a table of values of g(.x) for integer values of x between —2 and 2. Use 
it and the information derived in the text to draw a graph and so determine 
the roots of g(x) = 0 as accurately as possible. 

(b) Find one accurate root of g(x) = 0 by inspection and hence determine precise 
values for the other two roots. 

(c) Show that f(x) = 4x 3 + 3x 2 — 6x — k = 0 has only one real root unless 
-5 < k < 

1.2 Determine how the number of real roots of the equation 

g(x) = 4x 3 — 17x 2 + 10.x + k = 0 

depends upon k. Are there any cases for which the equation has exactly two 
distinct real roots? 

1.3 Continue the analysis of the polynomial equation 

/ (x) = x 7 + 5x 6 + x 4 — x 3 + x 2 — 2 = 0, 
investigated in subsection 1.1.1, as follows. 

(a) By writing the fifth-degree polynomial appearing in the expression for /'(x) 
in the form 7x 5 + 30x 4 + a(x — b) 2 + c, show that there is in fact only one 
positive root of / (x) = 0. 

(b) By evaluating /(l), /( 0) and /(— 1), and by inspecting the form of f(x) for 
negative values of x, determine what you can about the positions of the real 
roots of f(x) = 0. 

1.4 Given that x = 2 is one root of 

g(x) = 2x 4 + 4x 3 - 9x 2 - 1 lx - 6 = 0, 

use factorisation to determine how many real roots it has. 

1.5 Construct the quadratic equations that have the following pairs of roots: (a) 
-6,-3; (b) 0,4; (c) 2,2; (d) 3 + 2i,3-2i, where i 2 = -1. 

1.6 Use the results of (i) equation (1.13), (ii) equation (1.12) and (iii) equation (1.14) 
to prove that if the roots of 3x 3 — x 2 — 10.x + 8 = 0 are a t ,a 2 and a 3 then 

(a) ay 1 + ay 1 + ay 1 = 5/4, 

(b) a 2 + a 2 + 0(2 = 61/9, 

(c) a 3 + a? 2 + a 3 = — 125/27. 

(d) Convince yourself that eliminating (say) a 2 and from (i), (ii) and (iii) does 
not give a simple explicit way of finding oq. 


1.7 Prove that 


by considering 


Trigonometric identities 


12 


V 3 + 1 

V 2 
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1.8 EXERCISES 


(a) the sum of the sines of n/3 and n/6, 

(b) the sine of the sum of n/3 and n/4. 

1.8 (a) Use the fact that sin(7r/6) = 1/2 to prove that tan(7i/12) = 2 — J3. 

(b) Use the result of (a) to show further that tan(7i/24) = q(2 — q) where 
q 2 = 2 + J3. 

1.9 Find the real solutions of 

(a) 3 sin 6 — 4 cos 6 = 2, 

(b) 4sin0 + 3cos0 = 6, 

(c) 12 sin 0 — 5cos0 = — 6. 

1.10 If s = sin(7i/8), prove that 

8s 4 - 8s 2 + 1 = 0, 

and hence show that s = [(2 — ^/2)/4] 1/2 . 

1.11 Find all the solutions of 

sin 0 + sin 40 = sin 29 + sin 30 

that lie in the range —n < 0 < n. What is the multiplicity of the solution 0 = 0? 


Coordinate geometry 

1.12 Obtain in the form (1.38) the equations that describe the following: 

(a) a circle of radius 5 with its centre at (1,-1); 

(b) the line 2x + 3y + 4 = 0 and the line orthogonal to it which passes through 

(c) an ellipse of eccentricity 0.6 with centre ( 1, 1) and its major axis of length 10 
parallel to the y-axis. 

1.13 Determine the forms of the conic sections described by the following equations: 

(a) x 2 +y 2 + 6x + 8y = 0; 

(b) 9x 2 - 4y 2 - 54x - 16y + 29 = 0; 

(c) 2x 2 + 2y 2 + 5xy — 4x + y — 6 = 0; 

(d) x 2 + y 2 + 2xy — 8x + 8y = 0. 

1.14 For the ellipse 


with eccentricity e, the two points (— ae, 0) and (ae, 0) are known as its foci. Show 
that the sum of the distances from any point on the ellipse to the foci is 2a. (The 
constancy of the sum of the distances from two fixed points can be used as an 
alternative defining property of an ellipse.) 


1.15 


1.16 


Partial fractions 

Resolve the following into partial fractions using the three methods given in 
section 1.4, verifying that the same decomposition is obtained by each method: 


(a) 


2x + 1 


x 2 + 3x — 10’ 


(b) 


x 2 — 3x 


Express the following in partial fraction form: 


(a) 


2x 3 — 5x + 1 
x 2 — 2x — 8 


(b) 


x 2 + x — 1 
x 2 + x — 2 
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1.17 


1.18 


Rearrange the following functions in partial fraction form : 

x — 6 x 3 + 3x 2 + x + 19 

(a) x 3 — x 2 + 4x — 4 ’ ' ' x 4 + 10x 2 + 9 ' 

Resolve the following into partial fractions in such a way that x does not appear 
in any numerator: 

2x 2 + x + 1 x 2 -2 x 3 - x - 1 

(x — l) 2 (x + 3)’ x 3 + 8x 2 + 16x’ (x + 3) 3 (x+l) 


Binomial expansion 

1.19 Evaluate those of the following that are defined: (a) 5 C 3 , (b) 3 Cs, (c) ~ 5 C 3 , (d) 

~ 3 c 5 . 

1.20 Use a binomial expansion to evaluate to five places of decimals, and 

compare it with the accurate answer obtained using a calculator. 


Proof by induction and contradiction 

1.21 Prove by induction that 


1.22 


1.23 

1.24 


1.25 


1.26 


1.27 


n 

r = ln(fi + 1) and 

r=l 

Prove by induction that 


1 + r + r 2 + b r k + 


Y^r 3 = \n 2 (n + l) 2 . 

r= 1 


+ r n 


1 - r" +1 
1 — r 


Prove that 3 2 " + 7, where n is a non-negative integer, is divisible by 8. 

If a sequence of terms u„ satisfies the recurrence relation «„ +1 = (1 — x)u n + nx 
with u i = 0 then show, using induction, that for n> 1 


Prove by induction that 


1 

x 


[nx — 1 + (1 — x)"]. 



The quantities a, in this exercise are all positive real 
(a) Show that 


— cot 6. 
numbers. 


ai«2 ^ 


/ fli + a 2 

V — 


2 


(b) Hence prove by induction on m that 


fli «2 • • • a p < 


a\ + ci2 + ■ ■ * + Up 


v 


where p = 2"‘ with m a positive integer. Note that each increase of m by 
unity doubles the number of factors in the product. 


Establish the values of k for which the binomial coefficient p Ck is divisible by p 
when p is a prime number. Use your result and the method of induction to prove 
that n p — n is divisible by p for all integers n and all prime numbers p. Deduce 
that n 5 — n is divisible by 30 for any integer n. 
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1.9 HINTS AND ANSWERS 


1.28 An arithmetic progression of integers a n is one in which a n = ao + nd, where ao 

and d are integers and n takes successive values 0, 1,2, 

(a) Show that if any one term of the progression is the cube of an integer then 
so are infinitely many others. 

(b) Show that no cube of an integer can be expressed as In + 5 for some positive 
integer n. 

1.29 Prove, by the method of contradiction, that the equation 

x n T a n ^\x n + • • • T" ci[X + Uo = 0, 

in which all the coefficients a,- are integers, cannot have a rational root, unless 
that root is an integer. Deduce that any integral root must be a divisor of a 0 and 
hence find all rational roots of 

(a) x 4 + 6x 3 + 4x 2 + 5x + 4 = 0, 

(b ) x 4 + 5x 3 + 2x 2 — lOx + 6 = 0. 

Necessary and sufficient conditions 

1.30 Prove that the equation ax 2 + bx + c = 0, in which a , b and c are real and a > 0, 
has two real distinct solutions IFF b 2 > 4 ac. 

1.31 For the real variable x, show that a sufficient, but not necessary, condition for 
f(x) = x(x + l)(2x + 1) to be divisible by 6 is that x is an integer. 

1.32 Given that at least one of a and b, and at least one of c and d , are non-zero, 
show that ad = be is both a necessary and sufficient condition for the equations 

ax + by = 0, 
cx + dy = 0, 

to have a solution in which at least one of x and y is non-zero. 

1.33 The coefficients a,- in the polynomial Q(x) = a 4 x 4 + a 3 x 3 + a 2 x 2 + a ix are all 
integers. Show that Q{n) is divisible by 24 for all integers n > 0 if and only if all 
the following conditions are satisfied : 

(i) 2 u 4 + <+ is divisible by 4; 

(ii) a 4 + a 2 is divisible by 12; 

(iii) a 4 + 03 + a 2 + ai is divisible by 24. 

1.9 Hints and answers 

1.1 (b) The roots are 1, ±(-7 + V33) = -0.1569, ±(-7 - V33) = -1-593. (c) -5 and 
j are the values of k that make /(— 1) and /(f) equal to zero. 

1.2 Three distinct roots if — ^ < k < ” ; two distinct roots, one atx=|,iffc = — S; 
two distinct roots, one at x = |, if k = 

1.3 (a) a = 4, b = l and c = are all positive. Therefore f(x) > 0 for all x > 0. 
(b) /( 1) = 5, /( 0) = —2 and /(— 1) = 5, and so there is at least one root in each 

of the ranges 0 < x < 1 and — 1 < x < 0. (x 7 + 5x 6 ) + (x 4 — x 3 ) + (x 2 — 2) 
is positive definite for — 5 < x < — J2. There are therefore no roots in this 
range, but there must be one to the left of x = —5. 

1.4 g(x) = (x — 2)(x + 3)(2x 2 + 2x + 1). The quadratic has complex roots and so 
g(x) = 0 has only two real roots. 

1.5 (a) x 2 + 9x +18 = 0; (b) x 2 - 4x = 0; (c) x 2 - 4x + 4 = 0; (d) x 2 - 6x + 13 = 0. 

1.6 (a) Divide (iii) by (ii). (b) Consider (i) 2 — 2(iii). (c) Consider (i) 3 — 3(i)(iii)+3(ii). 

1.7 (a) Use sin(7i/4) = 1/U2. (b) Use results (1.20) and (1.20). 

1.8 (a) Use (1.32). (b) Use (1.34) and show that q 4 + 1 = 4 q 1 . 
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1.9 

1.10 
1.11 

1.12 


1.13 


1.14 

1.15 

1.16 

1.17 

1.18 


1.19 

1.20 

1.23 

1.25 

1.26 


1.27 


1.28 


1.29 


(a) 1.339. (b) No solution because 6 2 > 4 2 + 3 2 . (c) —0.0849. 

Use the formula for sin(2^/8) and square both sides. sin 2 (7t/4) = 1/2. 
Show that the equation is equivalent to sin(50/2) sin(d) sin(0/2) = 0. 
Solutions are — 47r /5, — 27r/5. 0, 27r/5, 4 tt: / 5, 7r. Its multiplicity is 3. 

(a) x 2 + y 2 — 2x + 2 y — 23 = 0. 

(b) The orthogonal line is 3.x — 2y — 1 = 0. The pair of lines has equation 
6x 2 — 6y 2 + 5 xy + lOx — 1 1 y — 4 = 0. 


(c) The minor axis has length 8. The ellipse has equation 
25.x 2 + 16y 2 - 50.x - 32y - 359 = 0. 

(a) A circle of radius 5 centred on (—3, —4). 

(b) A hyperbola with ‘centre’ (3,-2) and ‘semi-axes' 2 and 3. 

(c) The expression factorises into two lines, x + 2y — 3 = 0 and 2.x + y + 2 = 0. 

(d) Write the expression as (x+y) 2 = 8(x— y) to see that it represents a parabola 
passing through the origin with the line x + y = 0 as its axis of symmetry. 

Show that y 2 can be replaced by a 2 — x 2 — a 2 e 2 + x 2 e 2 and that the two lengths 
are a + ex and a — ex. 


(a) 


7(x — 2) 
(a) 2x + 4 + 
x + 2 


x 2 + 4 

1 


7(x + 5) 
109 

+ 


6(x — 4) 


4 4 

_ (b) ^ + 3(^ 

(b) 1 — 


3)' 


1 


(a) 

(a) 


1 


6(.x + 2)’ 
x + 1 


+ 


1 


3(x + 2) 3(x — 1)’ 


+ 


1’ 

1 


(b) 


1 


: + 9 


+ 


+ 1 ' 


(x— 1) (x + 3)' 

_j_ 9 7_ 

^ 8x 8(.x + 4) 2(x + 4) 2 


(c) 


1 


1 


+ 


54 


+ 


100 


(x Y 1) ' (x + 3) (x + 3) 2 (x + 3) 3 _ 

(a) 10, (b) not defined, (c) —35, (d) —21. 

Write it as 3(1 +0.05p 1/2 and evaluate ~ 1/2 Ct up to k = 3. The approximate and 
accurate values agree to five places of decimals, both giving 0.48795. 

Write 3 2 " as 8m — 7. 

Use the half-angle formulae of equations (1.32) to (1.34) to relate functions of 


Q/2 k to those of d/2 k+i . 

(a) Consider (ai — a 2 ) 2 > 0. (b) Write aj + ■ ■ • + a p = A and a p+i + • • • + a p+p = B 
and use result (a) to replace the product AB with an expression involving the 
sum A + B. Note that 2 p = 2 m+1 . 

Divisible for k = l,2,...,p — 1. Expand ( n + l) p as n p + Y7i~ lp Ckn k + 1. Apply 
the stated result for p = 5. Note that n 5 — n = n(n — 1 )(n + 1 )( n 2 + 1); the product 
of any three consecutive integers must divide by both 2 and 3. 

(a) Suppose a N = ao + Nd = m 3 is the largest cube; then consider ( m + d) 3 . 

(b) Suppose that IN + 5 = m 3 . Show that (m — 7) 3 differs from this by a multiple 
of 7. Deduce that q 3 must have the form In + 5 for some q in 0 < q < 7. 
Show explicitly that this is not so. Note. It is not sufficient to carry out the 
explicit valuations and rely on the construct from part (a). 

By assuming x = p/q with q =/= 1, show that a fraction —p"/q is equal to an 
integer a„-ip"~ l + • • • + a^pq"^ 2 + aoq' l ~ l . This is a contradiction and is only 
resolved if q = 1 and the root is an integer. 


(a) The only possible candidates are +l,+2, +4. None is a root. 

(b) The only possible candidates are +l,+2, +3, +6. Only —3 is a root. 
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1.9 HINTS AND ANSWERS 


1.30 


1.31 


1.32 


1.33 


(i) Show that the equation can be reformulated as 


a 




b 2 — 4 ac 
4 a 


(ii) If the real distinct solutions are a and /?, show that b = —(a + /?)a, and 
c = cufia. Then consider the inequality 0 < (a — /?) 2 = (a + /?) 2 — 
f(x) can be written as x(x + l)(x + 2) + x(x + l)(x — 1). Each term consists of 
the product of three consecutive integers of which one must therefore divide by 
2 and (a different) one by 3. Thus each term separately divides by 6 and so 
therefore does /(x). Note that if x is the root of 2x 3 + 3x 2 + x — 24 = 0 that lies 
near the non-integer value x = 1.826 then x(x + l)(2x + 1) = 24 and therefore 
divides by 6. 

(i) If x 0, multiply the first equation by d and the second by b and subtract. 
If y ^ 0, multiply by c and a respectively instead, (ii) Suppose a 0 and 
c =f= 0. Whilst ensuring that no possible division by zero occurs, deduce that 
the equations are consistent, with solution x = ~(b/a)y = —(d/c)y for arbitrary 
non-zero y. 

Note that, e.g., the condition for 6a 4 + ai to be divisible by 4 is the same as the 
condition for 2a 4 + a 3 to be divisible by 4. 

For the necessary (only if) part of the proof set n = 1,2,3 and take integer 
combinations of the resulting equations. 

For the sufficient (if) part of the proof use the stated conditions to prove the 
proposition by induction. Note that n 3 — n is divisible by 6 and that n 2 + 3 n is 
even. 
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2 


Preliminary calculus 


This chapter is concerned with the formalism of probably the most widely used 
mathematical technique in the physical sciences, namely the calculus. The chapter 
divides into two sections. The first deals with the process of differentiation and the 
second with its inverse process, integration. The material covered is essential for 
the remainder of the book and serves as a reference. Readers who have previously 
studied these topics should ensure familiarity by looking at the worked examples 
in the main text and by attempting the exercises at the end of the chapter. 


2.1 Differentiation 

Differentiation is the process of determining how quickly or slowly a function 
varies, as the quantity on which it depends, its argument, is changed. More 
specifically it is the procedure for obtaining an expression (numerical or algebraic) 
for the rate of change of the function with respect to its argument. Familiar 
examples of rates of change include acceleration (the rate of change of velocity) 
and chemical reaction rate (the rate of change of chemical composition). Both 
acceleration and reaction rate give a measure of the change of a quantity with 
respect to time. However, differentiation may also be applied to changes with 
respect to other quantities, for example the change in pressure with respect to a 
change in temperature. 

Although it will not be apparent from what we have said so far, differentiation 
is in fact a limiting process, that is, it deals only with the infinitesimal change in 
one quantity resulting from an infinitesimal change in another. 


2.1.1 Differentiation from first principles 

Let us consider a function / ( x ) that depends on only one variable x, together with 
numerical constants, for example, f(x) = 3x 2 or f(x) = sinx or /(x) = 2 + 3/x. 
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2.1 DIFFERENTIATION 



f(x + Ax) 


f(x) 


Figure 2.1 The graph of a function f(x) showing that the gradient of the 
function at P, given by tand, is approximately equal to A / /Ax. 


Figure 2.1 shows an example of such a function. Near any particular point, 
P, the value of the function changes by an amount A/, say, as x changes 
by a small amount Ax. The slope of the tangent to the graph of f(x) at P 
is then approximately A / / Ax, and the change in the value of the function is 
A / = f(x + Ax) — f(x). In order to calculate the true value of the gradient, or 
first derivative, of the function at P, we must let Ax become infinitesimally small. 
We therefore define the first derivative of /(x) as 


<( X)= *M = l,m /<* + A »)-/W 

dx Ax->o Ax 


( 2 . 1 ) 


provided that the limit exists. The limit will depend in almost all cases on the 
value of x. If the limit does exist at a point x = a then the function is said to be 
differentiable at a; otherwise it is said to be non-differentiable at a. The formal 
concept of a limit and its existence or non-existence is discussed in chapter 4; for 
present purposes we will adopt an intuitive approach. 

In the definition (2.1), we allow Ax to tend to zero from either positive or 
negative values and require the same limit to be obtained in both cases. A 
function that is differentiable at a is necessarily continuous at a (there must be 
no jump in the value of the function at a), though the converse is not necessarily 
true. This latter assertion is illustrated in figure 2.1: the function is continuous 
at the ‘kink’ A but the two limits of the gradient as Ax tends to zero from 
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positive or negative values are different and so the function is not differentiable 
at A. 

It should be clear from the above discussion that near the point P we may 
approximate the change in the value of the function, A/, that results from a small 
change Ax in x by 


A / * igA*. ,2.2) 

As one would expect, the approximation improves as the value of Ax is reduced. 
In the limit in which the change Ax becomes infinitesimally small, we denote it 
by the differential dx, and (2.2) reads 


df = 


df(x) 

dx 


dx. 


(2.3) 


This equality relates the infinitesimal change in the function, df, to the infinitesimal 
change dx that causes it. 

So far we have discussed only the first derivative of a function. However, we 
can also define the second derivative as the gradient of the gradient of a function. 
Again we use the definition (2.1) but now with /(x) replaced by f'(x). Hence the 
second derivative is defined by 


f"(x) 


lim 

A.x-cO 


/'(x + Ax) - /'(x) 
Ax 


(2.4) 


provided that the limit exists. A physical example of a second derivative is the 
second derivative of the distance travelled by a particle with respect to time. Since 
the first derivative of distance travelled gives the particle’s velocity, the second 
derivative gives its acceleration. 

We can continue in this manner, the nth derivative of the function f(x) being 
defined by 


/<">(x) = lim 

Ax — >0 


/ <n-1) (x + Ax) - / <n_1) (x) 


Ax 


(2.5) 


It should be noted that with this notation f'(x) = / (1| (x), /"(x) = / (2) (x), etc., and 
that formally / <0) (x) = /(x). 

All this should be familiar to the reader, though perhaps not with such formal 
definitions. The following example shows the differentiation of f(x) = x 2 from first 
principles. In practice, however, it is desirable simply to remember the derivatives 
of standard functions; the techniques given in the remainder of this section can 
be applied to find more complicated derivatives. 
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2.1 DIFFERENTIATION 


► Find from first principles the derivative with respect to x of f(x) = x 2 


Using the definition (2.1), 

/' ( x)=iim /(x+A ; ) - /(x) 
Ax— *0 Ax 

( x + Ax) 2 -x 2 
= lim 

Ax->0 Ax 

2xAx + (Ax) 2 
= lim 

Ax->0 Ax 

= lim (2x + Ax). 

Ax— ►() 

As Ax tends to zero, 2x + Ax tends towards 2x, hence 

fix) = 2x. ◄ 


Derivatives of other functions can be obtained in the same way. The derivatives 
of some simple functions are listed below (note that a is a constant): 


d_ 

dx 


4~ (x n ) = nx"-\ 4- (e ax ) = ae ax , 

dx dx 


(sin ax) = a cos ax, — (cos ax) = —a sin ax, 
dx 


— (In ax) = — , 

dx x 

d , 

— (sec ax) = a sec ax tan ax, 
dx 


a 2 a . 

— (tan ax) = a sec ax, — (cosec ax) = —a cosec ax cot ax, 
dx dx 


a 2 

— (cot ax) = —a cosec ax, 
dx 

d ( _! x\ —1 

dx V a) Ja 2 - x 2 


d (■ -i x \ _ 1 

dx V Sm a) N la 2 — x 2 ’ 

d / x\ a 

— tan - = r-. 

dx V a / a 2 + x- 


Differentiation from first principles emphasises the definition of a derivative as 
the gradient of a function. However, for most practical purposes, returning to the 
definition (2.1) is time consuming and does not aid our understanding. Instead, as 
mentioned above, we employ a number of techniques, which use the derivatives 
listed above as ‘building blocks’, to evaluate the derivatives of more complicated 
functions than hitherto encountered. Subsections 2. 1.2-2. 1.7 develop the methods 
required. 


2.1.2 Differentiation of products 

As a first example of the differentiation of a more complicated function, we 
consider finding the derivative of a function f(x) that can be written as the 
product of two other functions of x, namely f(x) = u(x)v(x). For example, if 
f (x) = x 3 sinx then we might take n(x) = x 3 and v(x) = sinx. Clearly the 
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separation is not unique. (In the given example, possible alternative break-ups 
would be u(x) = x 2 , v(x) = xsinx, or even u(x) = x 4 tan x, v(x ) = x ~ l cosx.) 

The purpose of the separation is to split the function into two (or more) parts, 
of which we know the derivatives (or at least we can evaluate these derivatives 
more easily than that of the whole). We would gain little, however, if we did 
not know the relationship between the derivative of / and those of u and v. 
Fortunately, they are very simply related, as we shall now show. 

Since f(x) is written as the product u(x)v(x), it follows that 

f(x + Ax) — f(x) = u(x + Ax)v(x + Ax) — u(x)v(x) 

= u(x + Ax)[t'(x + Ax) — r(x)] + [n(x + Ax) — u(x)]u(x). 


From the definition of a derivative (2.1), 
df_ = lim /(x + Ax)-/(x) 
dx Ax— >o Ax 

v(x) 


lim { u(x + Ax) 
Ax— >0 ’ 


v(x + Ax) — v(x) 


Ax 


u(x + Ax) — u(x) 


Ax 


In the limit Ax — » 0, the factors in square brackets become dv/dx and du/dx 
(by the definitions of these quantities) and u(x + Ax) simply becomes m(x). 
Consequently we obtain 


df_ 

dx 


d_ 

dx 


[w(x)r(x)] = m(x) 


dv(x) 

dx 


du(x) 

dx 


v(x). 


( 2 . 6 ) 


In primed notation and without writing the argument x explicitly, (2.6) is stated 
concisely as 


f = (uv)' = uv' + u'v. 


(2.7) 


This is a general result obtained without making any assumptions about the 
specific forms /, u and v, other than that /(x) = u(x)v(x). In words, the result 
reads as follows. The derivative of the product of two functions is equal to the 
first function times the derivative of the second plus the second function times the 
derivative of the first. 


► Find the derivative with respect to x of f(x) = x 3 sinx. 

Using the product rule, (2.6), 

-^-(x 3 sinx) = x 3 -^-(sinx) + -f-(x 3 )sinx 
dx dx dx 

= x 3 cos x + 3x 2 sin x. ◄ 


The product rule may readily be extended to the product of three or more 
functions. Considering the function 

/(.x) = M(.x)n(.x)w(x) (2.8) 
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and using (2.6), we obtain, as before omitting the argument, 


df_ 

dx 


d du 

= u— (vw) + — vw. 
dx dx 


Using (2.6) again to expand the first term on the RHS gives the complete result 


d dw dv du 

— — (uvw) = uv — — |- u—w + — vw (2.9) 

dx dx dx dx 


or 


(uvw)' = uvw 1 + uv'w + u'vw. 


( 2 . 10 ) 


It is readily apparent that this can be extended to products containing any number 
n of factors; the expression for the derivative will then consist of n terms with 
the prime appearing in successive terms on each of the n factors in turn. This is 
probably the easiest way to recall the product rule. 


2.1.3 The chain rule 


Products are just one type of complicated function that we may encounter in 
differentiation. Another is the function of a function, e.g. f(x) = (3 + x 2 ) 3 = m(x) 3 , 
where u(x) = 3 + x 2 . If A/, Am and Ax are small finite quantities, it follows that 


A / _ A / Am 
A x Am Ax ’ 

As the quantities become infinitesimally small we obtain 


( 2 . 11 ) 


df df du 
dx du dx 

This is the chain rule, which we must apply when differentiating a function of a 
function. 


►Find the derivative with respect to x of f(x) = (3 + x 2 ) 3 . 


Rewriting the function as f(x) = id, where u(x) = 3 +x 2 , and applying (2.11) we find 

= 3u 2 ^- = 3m 2 -^-(3 + x 2 ) = 3 m 2 x 2x = 6x(3 + x 2 ) 2 . ◄ 
dx dx dx 


Similarly, the derivative with respect to x of /(x) = l/v(x) may be obtained by 
rewriting the function as /(x) = t; -1 and applying (2.11): 

df -■> dv 1 dv 

fhc~~ V ( ~' ) 

The chain rule is also useful for calculating the derivative of a function / with 
respect to x when both x and / are written in terms of a variable (or parameter), 
say t. 
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► Fine/ the derivative with respect to x of f(t) = 2at, where x = at 2 . 


We could of course substitute for t and then differentiate / as a function of x, but in this 
case it is quicker to use 

df df dt ^ 1 1 

dx dt dx 2 at t ’ 

where we have used the fact that 

dt / dx \ _1 

dx \dt J 


2.1.4 Differentiation of quotients 

Applying (2.6) for the derivative of a product to a function f(x) = u(x)[ l/y(x)], 
we may obtain the derivative of the quotient of two factors. Thus 



where (2.12) has been used to evaluate (l/v)'. This can now be rearranged into 
the more convenient and memorisable form 



(2.13) 


This can be expressed in words as the derivative of a quotient is equal to the bottom 
times the derivative of the top minus the top times the derivative of the bottom, all 
over the bottom squared. 


► Find the derivative with respect to x of f(x) = sinx/x. 


Using (2.13) with u(x) 


= sinx, r(x) = x and hence ii'(x) = 

... . x cos x — sin x cos x 
f M = 5 = 


cosx, v'(x) = 1 , we find 
sinx 



2.1.5 Implicit differentiation 

So far we have only differentiated functions written in the form y = f(x). 
However, we may not always be presented with a relationship in this simple 
form. As an example consider the relation x 3 — 3 xy + y 3 = 2. In this case it is 
not possible to rearrange the equation to give y as a function of x. Nevertheless, 
by differentiating term by term with respect to x (implicit differentiation), we can 
find the derivative of y. 
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►Find dy/dx if x 3 — 3 xy + y 3 = 2. 


Differentiating each term in the equation with respect to x we obtain 


d 

dx 


=> 3x 2 


(x 3 )-^-(3 xy)+-^-(y 3 ) 
dx dx 


dy 


dx 


, dy 


3v • 3r • 3y ’ 


dx 



= 0 , 


where the derivative of 3 xy has been found using the product rule. Hence, rearranging for 

dy/dx, 

dy y — x 2 
dx y 2 — x 

Note that dy/dx is a function of both x and y and cannot be expressed as a function of x 
only. ◄ 


2.1.6 Logarithmic differentiation 

In circumstances in which the variable with respect to which we are differentiating 
is an exponent, taking logarithms and then differentiating implicitly is the simplest 
way to find the derivative. 

►Find the derivative with respect to x of y = a x . 

To find the required derivative we first take logarithms and then differentiate implicitly: 

In y = In a x = x In a => - f - = In a. 

y dx 

Now, rearranging and substituting for y, we find 

dy , 

' = y In a = a In a. ◄ 
dx 


2.1.7 Leibniz’ theorem 

We have discussed already how to find the derivative of a product of two or 
more functions. We now consider Leibniz ' theorem, which gives the corresponding 
results for the higher derivatives of products. 

Consider again the function f(x) = u(x)v{x). We know from the product rule 
that f = uv' + u’v. Using the rule once more for each of the products, we obtain 

/" = (uv" + mV) + (mV + u"v) 

= uv" + 2wV + u"v. 

Similarly, differentiating twice more gives 

f" = uv'" + 3 u'v" + 3 u"v' + u'"v, 

/ ,4) = tV 4) + 4 u'v'" + 6 u"v" + 4 u"'v’ + i/ 4 V 
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The pattern emerging is clear and strongly suggests that the results generalise to 


f (n) = 


E h ! 

_ o r\(n — r)l 


u (r) v (n ~ r) 


r = 0 


r C r u (r) v (n ~ r \ 


(2.14) 


where the fraction n\/[r\(n — r) !] is identified with the binomial coefficient n C r 
(see chapter 1). To prove that this is so, we use the method of induction as follows. 
Assume that (2.14) is valid for n equal to some integer N. Then 

r= 0 

= N C r [u {r) v (N ~ r+1) + u {r+1) v iN - r) ] 

r = 0 

N N + 1 

= N C s u {s) v iN+l - s) + A 'C s _ 1 u (s) r (JV+1 “ s) , 

s=0 s= 1 

where we have substituted summation index s for r in the first summation, and 
for r + 1 in the second. Now, from our earlier discussion of binomial coefficients, 
equation (1.51), we have 

N I N iV+ 1 

I '-'S— 1 


and so, after separating out the first term of the first summation and the last 
term of the second, obtain 

/,*+!> = jv C 0 m ( 0 V jv+1) + J2 N+l C s u {s) v (N+l - s) + N C N u {N+1) v w . 

s= 1 


But n Cq = 1 = N+l C 0 and n Cn = 1 = n+1 Cn+i, and so we may write 

/(JV+ 1 , = w+i CoM (0) p (w +1 ) + N+1 C s u^V N+l - s) + N+l C N+1 u< N+1 V 0) 

s= 1 


JV+1 

= J2 N+Ic ° uis)v 


(JV+l-s) 


s= 0 


This is just (2.14) with n set equal to N + 1. Thus, assuming the validity of (2.14) 
for n = N implies its validity for n = N + 1. However, when n = 1 equation 
(2.14) is simply the product rule, and this we have already proved directly. These 
results taken together establish the validity of (2.14) for all n and prove Leibniz’ 
theorem. 
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Figure 2.2 A graph of a function, /(x), showing how differentiation corre- 
sponds to finding the gradient of the function at a particular point. Points B, 
Q and S are stationary points (see text). 


► Find the third derivative of the function f(x) = x 3 sinx. 


Using (2.14) we immediately find 

f'"(x) = 6 sinx + 3(6x)cosx + 3(3x 2 )(— sinx) + x 3 (— cosx) 
= 3(2 — 3x 2 ) sinx + x(18 — x 2 )cosx. ◄ 


2.1.8 Special points of a function 

We have interpreted the derivative of a function as the gradient of the function 
at the relevant point (figure 2.1). If the gradient is zero at some point then the 
function is said to have a stationary point there. Clearly, in graphical terms, this 
corresponds to a horizontal tangent to the graph at that point. 

Stationary points may be divided into three categories and an example of each 
is shown in figure 2.2. Point B is said to be a minimum since the function increases 
in value in both directions away from it. Point Q is said to be a maximum since the 
function decreases in both directions away from it. Note that B is not the overall 
minimum value of the function and Q is not the overall maximum; rather, they 
are a local minimum and a local maximum. The third type of stationary point is 
the stationary point of inflection , S. In this case the function falls in the positive 
x-direction and rises in the negative x-direction so that S is neither a maximum 
nor a minimum. Nevertheless, the gradient of the function is zero at S, i.e. the 
graph of the function is flat there, and this justifies our calling it a stationary 
point. Of course, a point at which the gradient of the function is zero but the 
function rises in the positive x-direction and falls in the negative x-direction is 
also a stationary point of inflection. 
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The above distinction between the three types of stationary point has been 
made rather descriptively. However, it is possible to define and distinguish sta- 
tionary points mathematically. From their definition as points of zero gradient, 
all stationary points must be characterised by df / dx = 0. In the case of the 
minimum, B , the slope, i.e. df /dx, changes from negative at A to positive at C 
through zero at B. Thus df /dx is increasing and so the second derivative d 2 f /dx 2 
must be positive. Conversely, at the maximum, Q, we must have that d 2 f /dx 2 is 
negative. 

It is less obvious, but intuitively reasonable, that at S, d 2 f /dx 2 is zero. This may 
be inferred from the following observations. To the left of S the curve is concave 
upwards so that df /dx is increasing with x and hence d 2 f /dx 2 > 0. To the right 
of S, however, the curve is concave downwards so that df /dx is decreasing with 
x and hence d 2 f /dx 2 < 0. 

In summary, at a stationary point df / dx = 0 and 

(i) for a minimum, d 2 f /dx 2 > 0, 

(ii) for a maximum, d 2 f /dx 2 < 0, 

(iii) for a stationary point of inflection, d 2 f /dx 2 — 0 and d 2 f /dx 2 changes sign 
through the point. 

In case (iii), a stationary point of inflection, in order that d 2 f /dx 2 changes sign 
through the point we normally require d?f /dx 2 f 0 at that point. This simple 
rule can fail for some functions, however, and in general if the first non-vanishing 
derivative of f(x) at the stationary point is f <n> then if n is even the point is a 
maximum or minimum and if n is odd the point is a stationary point of inflection. 
This may be seen from the Taylor expansion (see equation (4.17)) of the function 
about the stationary point, but it is not proved here. 


►find the positions and natures of the stationary points of the function 

fix) = 2x 3 - 3x 2 - 36.x + 2. 


The first criterion for a stationary point is that df /dx = 0, and hence we set 

= 6x 2 — 6x — 36 = 0, 
dx 

from which we obtain 


(x - 3)(x + 2) = 0. 


Hence the stationary points are at x = 3 and x = —2. To determine the nature of the 
stationary point we must evaluate d 2 f /dx 2 : 


d 2 f 

dx 2 


= 12x-6. 
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f(x) 



Figure 2.3 The graph of a function f(x) that has a general point of inflection 
at the point G. 


Now, we examine each stationary point in turn. For x = 3, d 2 f/dx 2 = 30. Since this is 
positive, we conclude that x = 3 is a minimum. Similarly, for x = —2, d 2 f /dx 2 = —30 and 
so x = — 2 is a maximum. ◄ 

So far we have concentrated on stationary points, which are defined to have 
df /dx = 0. We have found that at a stationary point of inflection d 2 f/dx 2 is 
also zero and changes sign. This naturally leads us to consider points at which 
d 2 f /dx 2 is zero and changes sign but at which df /dx is not, in general, zero. Such 
points are called general points of inflection or simply points of inflection. Clearly, 
a stationary point of inflection is a special case for which df /dx is also zero. 
At a general point of inflection the graph of the function changes from being 
concave upwards to concave downwards (or vice versa), but the tangent to the 
curve at this point need not be horizontal. A typical example of a general point 
of inflection is shown in figure 2.3. 

The determination of the stationary points of a function, together with the 
identification of its zeroes, infinities and possible asymptotes, is usually sufficient 
to enable a graph of the function showing most of its significant features to be 
sketched. Some examples for the reader to try are included in the exercises at the 
end of this chapter. 


2.1.9 Curvature of a function 

In the previous section we saw that at a point of inflection of the function 
fix), the second derivative d 2 f /dx 2 changes sign and passes through zero. The 
corresponding graph of / shows an inversion of its curvature at the point of 
inflection. We now develop a more quantitative measure of the curvature of a 
function (or its graph), which is applicable at general points and not just in the 
neighbourhood of a point of inflection. 

As in figure 2.1, let 6 be the angle made with the x-axis by the tangent at a 
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Figure 2.4 Two neighbouring tangents to the curve / (x) whose slopes differ 
by Ad. The angular separation of the corresponding radii of the circle of 
curvature is also A 6. 


point P on the curve / = f(x), with tan 9 = df / dx evaluated at P. Now consider 
also the tangent at a neighbouring point Q on the curve, and suppose that it 
makes an angle 9 + AO with the x-axis, as illustrated in figure 2.4. 

It follows that the corresponding normals at P and Q, which are perpendicular 
to the respective tangents, also intersect at an angle AO. Furthermore, their point 
of intersection, C in the figure, will be the position of the centre of a circle that 
approximates the arc PQ, at least to the extent of having the same tangents at 
the extremities of the arc. This circle is called the circle of curvature. 

For a finite arc PQ, the lengths of CP and CQ will not, in general, be equal, 
as they would be if / = f(x) were in fact the equation of a circle. But, as Q 
is allowed to tend to P, i.e. as AO — > 0, they do become equal, their common 
value being p , the radius of the circle, known as the radius of curvature. It follows 
immediately that the curve and the circle of curvature have a common tangent 
at P and lie on the same side of it. The reciprocal of the radius of curvature, p~ l , 
defines the curvature of the function f(x) at the point P. 

The radius of curvature can be defined more mathematically as follows. The 
length As of arc PQ is approximately equal to pAO and, in the limit AO — > 0, this 
relationship defines p as 


As ds 
p = Inn — = — 
A 0 — >o A 9 dO 


(2.15) 


It should be noted that, as s increases, 9 may increase or decrease according to 
whether the curve is locally concave upwards (i.e. shaped as if it were near a 
minimum in / (x)) or concave downwards. This is reflected in the sign of p. which 
therefore also indicates the position of the curve (and of the circle of curvature) 
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relative to the common tangent, above or below. Thus a negative value of p 
indicates that the curve is locally concave downwards and that the tangent lies 
above the curve. 

We next obtain an expression for p, not in terms of s and 0 but in terms 
of x and f(x). The expression, though somewhat cumbersome, follows from the 
defining equation (2.15), the defining property of 0 that tan0 = df /dx = f and 
the fact that the rate of change of arc length with x is given by 


ds 

dx 



(2.16) 


This last result, simply quoted here, is proved more formally in subsection 2.2.13. 
From the chain rule (2.11) it follows that 


ds ds dx 

^ dd dx dO 


Differentiating both sides of tan0 = df /dx with respect to x gives 




from which, using sec 2 0 = 1+ tan 2 0 = 1+ (/') 2 , we can obtain dx/dd as 

dx 1 + tan 2 0 1 + (/') 2 


(2.17) 


d8 f" f" ■ 

Substituting (2.16) and (2.18) into (2.17) then yields the final expression for p, 

[1 + (/') 2 ] 3/2 


(2.18) 


P = 


f" 


(2.19) 


It should be noted that the quantity in brackets is always positive and that 
its ^th root is also taken as positive. The sign of p is thus solely determined by 
that of d 2 f /dx 2 , in line with our previous discussion relating the sign to whether 
the curve is concave or convex upwards. If, as happens at a point of inflection, 
drf /dx 2 is zero then p is formally infinite and the curvature of /(x) is zero. As 
d 2 f /dx 2 changes sign on passing through zero, both the local tangent and the 
circle of curvature change from their initial positions to the opposite side of the 
curve. 
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► S/iovv that the radius of curvature at the point {x,y) on the ellipse 


has magnitude (u 4 y 2 + b 4 x 2 ) 3 ^ 2 /(a 4 b 4 ) and the opposite sign to y. Check the special case 
b = a, for which the ellipse becomes a circle. 


Differentiating the equation of the ellipse with respect to x gives 

=() 

a 2 b 2 dx 

and so 

dy b 2 x 

dx a 2 y 

A second differentiation, using (2.13), then yields 


d 2 y 

b 2 f y — xy ' > 

1-- — 

( y - + X ~) 

b 4 

dx 2 

a 2 ^ y 2 ) 

1 a 2 y 3 

\b 2 + a 2 ) 

a 2 y 3 


where we have used the fact that (x,y) lies on the ellipse. We note that d 2 y/dx 2 , and 
hence p, has the opposite sign to y 3 , and hence to y . Substituting in (2.19) gives for the 
magnitude of the radius of curvature 

(a 4 y 2 + b 4 x 2 ) 3 l 2 
~c&b 4 ' 

For the special case b = a, \p\ reduces to a~ 2 (y 2 + x 2 ) 3/2 and, since x 2 + y 2 = a 2 , this in 
turn gives \p\ = a , as expected. ◄ 

The discussion in this section has been confined to the behaviour of curves 
that lie in one plane; examples of the application of curvature to the bending of 
loaded beams and to particle orbits under the influence of a central forces can be 
found in the exercises at the ends of later chapters. A more general treatment of 
curvature in three dimensions is given in section 10.3, where a vector approach is 
adopted. 


\P\ = 


[1 + h 4 x 2 /(o 4 y 2 )] 


1/2 


— b 4 /(a 2 y 3 ) 


2.1.10 Theorems of differentiation 

Rolle's theorem 

Rolle’s theorem (figure 2.5) states that if a function f(x) is continuous in the 
range a < x < c, is differentiable in the range a < x < c and satisfies f(a) = f(c) 
then for at least one point x = b, where a < b < c, f'(b) = 0. Thus Rolle’s 
theorem states that for a well-behaved (continuous and differentiable) function 
that has the same value at two points either there is at least one stationary point 
between those points or the function is a constant between them. The validity of 
the theorem is immediately apparent from figure 2.5 and a full analytic proof will 
not be given. The theorem is used in deriving the mean value theorem, which we 
now discuss. 
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fix) 



Figure 2.5 The graph of a function /(x), showing that if f(a) = f(c ) then at 
one point at least between x = a and x = c the graph has zero gradient. 



Figure 2.6 The graph of a function f(x); at some point x = b it has the same 
gradient as the line AC. 


Mean value theorem 

The mean value theorem (figure 2.6) states that if a function f(x) is continuous 
in the range a < x < c and differentiable in the range a < x < c then 


fib) = 


fjc)-f(a) 
c — a 


( 2 . 20 ) 


for at least one value b where a < b < c. Thus the mean value theorem states 
that for a well-behaved function the gradient of the line joining two points on the 
curve is equal to the slope of the tangent to the curve for at least one intervening 
point. 

The proof of the mean value theorem is found by examination of figure 2.6, as 
follows. The equation of the line AC is 


g(.x) = f(a) + (x - a) 


m-f(a ) 
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and hence the difference between the curve and the line is 

h(x) = f{x) - g(x) = f(x)-fia) - (x - a )— — — . 

c — a 

Since the curve and the line intersect at A and C, h(x) = 0 at both of these points. 
Hence, by an application of Rolle’s theorem, h'(x ) = 0 for at least one point b 
between A and C. Differentiating our expression for h(x), we find 


h\x) = fix) 


m-fjg ) ' 

c — a 


and hence at b , where h'(x) = 0, 


m 


f(c)-f{a) 
c — a 


Applications of Rolle’s theorem and the mean value theorem 

Since the validity of Rolle’s theorem is intuitively obvious, given the conditions 
imposed on f{x), it will not be surprising that the problems that can be solved 
by applications of the theorem alone are relatively simple ones. Nevertheless we 
will illustrate it with the following example. 


► IL/iat semi-quantitative results can be deduced by applying Rolle’s theorem to the follow- 
ing functions f (x), with a and c chosen so that f(a) = f(c) = 0 ? (i) sin x, ( ii ) cos x, ( Hi ) 
x 2 — 3x + 2, ( iv ) x 1 + lx + 3, ( v) 2x 3 — 9x 2 — 24x + k. 


(i) If the consecutive values of x that make sinx = 0 are oq,a 2 ,--- (actually x = nn, for 
any integer n) then Rolle's theorem implies that the derivative of sinx, namely cosx, has 
at least one zero lying between each pair of values a, and a j+1 . 

(ii) In an exactly similar way, we conclude that the derivative of cosx, namely — sinx, 
has at least one zero lying between consecutive pairs of zeroes of cosx. These two 
results taken together (but neither separately) imply that sinx and cosx have interleaving 
zeroes. 

(iii) For f(x) = x 2 — 3x + 2, f(a) = f(c) = 0 if a and c are taken as 1 and 2 respectively. 
Rolle’s theorem then implies that fix) = 2x — 3 = 0 has a solution x = b with b in the 
range 1 < b < 2. This is obviously so, since b = 3/2. 

(iv) With f{x) = x 2 + lx + 3, the theorem tells us that if there are two roots of 
x 2 Y 7x + 3 = 0 then they have the root of fix) = 2x + 7 = 0 lying between them. Thus 
any (real) roots of x 2 + 7x + 3 =0 lie on either side of x = —7/2. The actual roots are 
(-7 + V37)/2. 

(v) If fix) = 2x 3 — 9x 2 — 24x + k then fix) = 0 is the equation 6 x 2 — 18x — 24 = 0, 
which has solutions x = — 1 and x = 4. Consequently, if cq and a 2 are two different roots 
of f(x) = 0 then at least one of —1 and 4 must lie in the open interval oq to a 2 . If, as is 
the case for a certain range of values of k, fix) = 0 has three roots, oma? and 0(3, then 
oq < —1 < 0 C 2 < 4 < 0 C 3 . 
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In each case, as might be expected, the application of Rolle’s theorem does no more than 
focus attention on particular ranges of values; it does not yield precise answers. ◄ 

Direct verification of the mean value theorem is straightforward when it is 
applied to simple functions. For example, if f(x) = x 1 , it states that there is a 
value b in the interval a < b < c such that 

c 2 - a 2 = f(c) — f(a) = (c — a)f'(b) = (c - a)2b. 

This is clearly so, since b = {a + c)/2 satisfies the relevant criteria. 

As a slightly more complicated example we may consider a cubic equation, say 
f(x) = x 3 + 2x 2 + 4x — 6 = 0, between two specified values of x, say 1 and 2. In 
this case we need to verify that there is a value of x lying in the range 1 < x < 2 
that satisfies 


18 - 1 = /( 2) - /( 1) = (2 - l)/'(x) = l(3x 2 + 4x + 4). 

This is easily done, either by evaluating 3x 2 +4x+4— 17 at x = 1 and at x = 2 and 
checking that the values have opposite signs or by solving 3x 2 +4x + 4 — 17 = 0 
and showing that one of the roots lies in the stated interval. 

The following applications of the mean value theorem establish some general 
inequalities for two common functions. 


► Determine inequalities satisfied by In x and sin x for suitable ranges of the real variable x. 


Since for positive values of its argument the derivative of lnx is x \ the mean value 
theorem gives us 

In c — In a 1 

c — a b 

for some hin0<a<fo<c. Further, since a < b < c implies that c _1 < b < a -1 , we 
have 

1 Inc — In a 1 

- < < -, 

c c — a a 

or, multiplying through by c — a and writing c/a = x where x > 1, 

I — < lnx < x — 1. 

x 

Applying the mean value theorem to sin x shows that 

sin c — sin a 

= cos b 

c — a 

for some b lying between a and c. If a and c are restricted to lie in the range 0 < a < c < n, 
in which the cosine function is monotonically decreasing (i.e. there are no turning points), 
we can deduce that 

sin c — sin a 

cos c < < cosa. ◄ 

c — a 
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Figure 2.7 An integral as the area under a curve. 


2.2 Integration 

The notion of an integral as the area under a curve will be familiar to the reader. 
In figure 2.7, in which the solid line is a plot of a function f(x), the shaded area 
represents the quantity denoted by 

1= f f(x)dx. (2.21) 

J a 

This expression is known as the definite integral of f(x) between the lower limit 
x = a and the upper limit x = b, and f(x) is called the integrand. 


2.2.1 Integration from first principles 

The definition of an integral as the area under a curve is not a formal definition, 
but one that can be readily visualised. The formal definition of I involves 
subdividing the finite interval a < x < b into a large number of subintervals, by 
defining intermediate points Cj such that a = £o < £i < £2 < • • • < £, n = b, and 
then forming the sum 


S = '52f( x i)(Zi-Zi- 1 ), (2-22) 

i= 1 

where x,- is an arbitrary point that lies in the range c,_i < x* < Cj (see figure 2.8). 
If now n is allowed to tend to infinity in any way whatsoever, subject only to the 
restriction that the length of every subinterval £,_i to tends to zero, then S 
might, or might not, tend to a unique limit, /. If it does then the definite integral 
of /(x) between a and b is defined as having the value I. If no unique limit exists 
the integral is undefined. For continuous functions and a finite interval a < x < b 
the existence of a unique limit is assured and the integral is guaranteed to exist. 
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Figure 2.8 The evaluation of a definite integral by subdividing the interval 
a < x < b into subintervals. 


► Evaluate from first principles the integral I = f 0 x 2 


We first approximate the area under the curve y = x 2 between 0 and b by n rectangles of 
equal width h. If we take the value at the lower end of each subinterval (in the limit of an 
infinite number of subintervals we could equally well have chosen the value at the upper 
end) to give the height of the corresponding rectangle, then the area of the kth rectangle 
will be (kh) 2 h = k 2 h } . The total area is thus 


n—l 

A = k 2 h 3 = (h 2 )\n(n — 1)(2 n— 1), 
k= o 


where we have used the expression for the sum of the squares of the natural numbers 
derived in subsection 1.7.1. Now li = b/n and so 


A = 



l)(2n - 1) 



As n — > oo, A — ► b 3 / 3, which is thus the value / of the integral. ◄ 


Some straightforward properties of definite integrals that are almost self-evident 
are: 


0 dx = 0, f (x) dx = 0, 


pc pb pc 

/ f(x)dx= / f(x) dx + / f(x)dx, 
J a J a J b 


[/(*) + g(*)] dx= f(x) dx+ g(x) dx. 


( 2 . 23 ) 

( 2 . 24 ) 

( 2 . 25 ) 


61 




PRELIMINARY CALCULUS 


Combining (2.23) and (2.24) with c set equal to a shows that 


/ (x) dx = 


f(x)dx. 


(2.26) 


2.2.2 Integration as the inverse of differentiation 


The definite integral has been defined as the area under a curve between two 
fixed limits. Let us now consider the integral 

F(x) = / f(u)du (2.27) 

J a 


in which the lower limit a remains fixed but the upper limit x is now variable. It 
will be noticed that this is essentially a restatement of (2.21), but that the variable 
.x in the integrand has been replaced by a new variable u. It is conventional to 
rename the dummy variable in the integrand in this way in order that the same 
variable does not appear in both the integrand and the integration limits. 

It is apparent from (2.27) that F(x) is a continuous function of x, but at first 
glance the definition of an integral as the area under a curve does not connect with 
our assertion that integration is the inverse process to differentiation. However, 
by considering the integral (2.27) and using the elementary property (2.24), we 
obtain 


PX~\~Ax 


F(x + Ax) = 


f(u)du 


px px+Ax 

= / f(u)du + / f(u)du 
J a J x 


/»2C+Ax 


= F(x) + 


f (u) du. 


Rearranging and dividing through by Ax yields 


Fix + Ax) — F)x) 
Ax 


1 

Ax 


px+Ax 


= t- / f(u) du. 


Letting Ax — > 0 and using (2.1) we find that the LHS becomes dF/dx, whereas 
the RHS becomes /(x). The latter conclusion follows because when Ax is small 
the value of the integral on the RHS is approximately f(x) Ax, and in the limit 
Ax ->0no approximation is involved. Thus 


dF)x) 

dx 


= fix), 


(2.28) 


or, substituting for F{x) from (2.27), 


d 

dx 


fiu) du 


= fix). 
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From the last two equations it is clear that integration can be considered as 
the inverse of differentiation. However, we see from the above analysis that the 
lower limit a is arbitrary and so differentiation does not have a unique inverse. 
Any function F(x) obeying (2.28) is called an indefinite integral of f(x), though 
any two such functions can differ by at most an arbitrary additive constant. Since 
the lower limit is arbitrary, it is usual to write 



f(u)du 


(2.29) 


and explicitly include the arbitrary constant only when evaluating F(x). The 
evaluation is conventionally written in the form 


f(x)dx = F(x) + c 


(2.30) 


where c is called the constant of integration. It will be noticed that, in the absence 
of any integration limits, we use the same symbol for the arguments of both / 
and F. This can be confusing, but is sufficiently common practice that the reader 
needs to become familiar with it. 

We also note that the definite integral of f(x) between the fixed limits x = a 
and x = b can be written in terms of F(x). From (2.27) we have 

nb rb na 

/ f(x)dx = / f(x)dx— / f(x)dx 
J a J x o J xo 

= F(b) — F(a), (2.31) 


where xo is any third fixed point. Using the notation F'(x) = dF/dx, we may 
rewrite (2.28) as F'(x) = /(x), and so express (2.31) as 

F'(x)dx = F(b)-F(a)= [F] b a . 

In contrast to differentiation, where repeated applications of the product rule 
and/or the chain rule will always give the required derivative, it is not always 
possible to find the integral of an arbitrary function. Indeed, in most real phys- 
ical problems exact integration cannot be performed and we have to revert to 
numerical approximations. Despite this cautionary note, it is in fact possible to 
integrate many simple functions and the following subsections introduce the most 
common types. Many of the techniques will be familiar to the reader and so are 
summarised by example. 



2.2.3 Integration by inspection 

The simplest method of integrating a function is by inspection. Some of the more 
elementary functions have well-known integrals that should be remembered. The 
reader will notice that these integrals are precisely the inverses of the derivatives 


63 



PRELIMINARY CALCULUS 


found near the end of subsection 2.1.1. A few are presented below, using the form 
given in (2.30). 


a dx = ax + c, 


ax +l 

ax dx = + c, 

n+ 1 


e ax dx = — + c, 
a 


— dx = a In x + c, 
x 


a sin bx 

a cos bx dx = \- c. 


—a cos bx 

a sin bx dx = \- c, 


, , — aln(cosbx) 

a tan bx dx = + c. 


, • „ , , a sin" +1 bx 

a cos bx sin bx dx = — — — — rT - + c, 


f , G , dx = tan 1 (-) + c, fa sin bx cos" bx dx = 
J a- +x z \aJ J 


b(n + 1 ) 

— acos" +1 bx 
b(n + 1) 


c, 




a 2 — x 2 


: dx = cos 


f 1 dx = sin 1 +c, 

J yja 2 — x 2 ' kCIZ 


where the integrals that depend on n are valid for all n ^ — 1 and where a and b 
are constants. In the two final results Ixl < a. 


2.2.4 Integration of sinusoidal functions 

Integrals of the type f sin" x dx and f cos" x dx may be found by using trigono- 
metric expansions. Two methods are applicable, one for odd n and the other for 
even n. They are best illustrated by example. 


► Evaluate the integral I = f sin 5 x dx. 


Rewriting the integral as a product of sinx and an even power of sinx, and then using 
the relation sin 2 x = 1 — cos 2 x yields 

/ = j sin 4 x sin x dx 
= / ( 1 — cos 2 x) 2 sin x dx 


'/« 

I 


= / ( 1 — 2 cos 2 x + cos 4 x) sin x dx 


= / (sin x — 2 sin x cos 2 x + sin x cos 4 x) dx 


I 


= — COS X + | cos 3 x — i cos 5 X + c, 

where the integration has been carried out using the results of subsection 2.2.3. ◄ 
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► Evaluate the integral I = f cos 4 xdx. 

Rewriting the integral as a power of cos 2 x and then using the double-angle formula 
cos 2 x = ^(1 + cos2x) yields 

/-/ic„r,r,h - / ( 1+ “ 2, ff, 

-/|( l + 2cos2» + c» s '2x)Jx. 

Using the double-angle formula again we may write cos 2 2x = ^(1 + cos4x), and hence 

I=f [f + f cos2x + |(1 + cos4x)] dx 

= \x + j sin 2x + |x + = sin 4x + c 
= lx + \ sin 2x + i s in 4x + c. ◄ 


2.2.5 Logarithmic integration 

Integrals for which the integrand may be written as a fraction in which the 
numerator is the derivative of the denominator may be evaluated using 

J j^dx = Infix) + c. (2.32) 

This follows directly from the differentiation of a logarithm as a function of a 
function (see subsection 2.1.3). 


► Evaluate the integral 



6x 2 + 2 cos x , 

— 5 : dx. 

x J + sin x 


We note first that the numerator can be factorised to give 2(3x 2 -l-cosx), and then that 
the quantity in brackets is the derivative of the denominator. Hence 


/ 



3x 2 + cos x 
x 3 + sin x 


dx = 2 ln(x 3 + sin x) + c. 


◄ 


2.2.6 Integration using partial fractions 

The method of partial fractions was discussed at some length in section 1.4, but 
in essence consists of the manipulation of a fraction (here the integrand) in such 
a way that it can be written as the sum of two or more simpler fractions. Again 
we illustrate the method by an example. 
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► Evaluate the integral 

1 = i 

f 1 dx. 


J 

X 2 +x 


We note that the denominator factorises to give x(x + 1 ). Hence 

I = [ — 7T dx. 

J x(x + 1 ) 

We now separate the fraction into two partial fractions and integrate directly: 

/ = / ( — — ) dx = In x — ln(x + 1) + c = In ( — — — ) + c. ◄ 

J \x x + iy V x + i/ 


2.2.7 Integration by substitution 

Sometimes it is possible to make a substitution of variables that turns a com- 
plicated integral into a simpler one, which can then be integrated by a standard 
method. There are many useful substitutions and knowing which to use is a matter 
of experience. We now present a few examples of particularly useful substitutions. 


► Evaluate the integral 

f 1 

1= — dx. 


J Vl-x 2 


Making the substitution x = sin u, we note that dx = cos u du, and hence 



f 1 f 

J V 1 — sin 2 u J 

*Jcos 2 u J 


Now substituting back for u, 

I = sin -1 x + c. 

This corresponds to one of the results given in subsection 2.2.3. ◄ 

Another particular example of integration by substitution is afforded by inte- 
grals of the form 


I = 


1 


a + b cos x 


dx 


or 


/ = 


1 


a + b sin x 


dx. 


(2.33) 


In these cases making the substitution t — tan(.x/2) yields integrals that can be 
solved more easily than the originals. Formulae expressing sinx and cosx in 
terms of t were derived in equations (1.32) and (1.33) [see p. 14], but before we 
can use them we must relate dx to dt as follows. 
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Since 



dt 1 , x 1/ , x\ 

2“ C ' 2-3( 1+,an ' j) 

1+f 2 

2 ’ 


the required relationship is 



dx = it dt. 

l+t 2 


(2.34) 

► Evaluate the integral 

f 2 

1=1 dx. 

J 1 + 3 cos x 


Rewriting cosx in terms of f and using (2.34) yields 

2 


/ T 

h 


+ 3 [(l-t 2 )(l+f 2 H] \l+t 2 
2(1 + f 2 ) ( 2 


dt 


+ t 2 + 3(1 — t 2 ) \ l+t 2 


dt 


'/iW 

' J (j5-f + J5 + f) 


( a/ 2 — t)(V2 + t) 

1 


dt 


= -pin 

V2 


yj2\yj2 — t yjl + f 

(V ^ _t) + Vt 

V2 + tan (x/2) 


dt 


= ln(V2 - f) + ln(V2 + t) + c 


•Jl — tan (x/2) 


+ c. ◄ 


Integrals of a similar form to (2.33), but involving sin2x, cos 2.x, tan 2.x, sin 2 x, 
cos 2 .x or tan 2 .x instead of cos.x and sinx, should be evaluated by using the 
substitution t = tan x. In this case 

sinx = , cosx = , and dx = - ^ . . (2.35) 

Vi + r 2 V 1 + f2 1 + f 


A hnal example of the evaluation of integrals using substitution is the method 
of completing the square (cf. subsection 1.7.3). 
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► Evaluate the integral 

f 1 

I = dx. 


J x 2 + 4x + 7 


We can write the integral in the form 


1 


(x + 2) 2 + 3 

Substituting y = x + 2, we find dy = dx and hence 


dx. 


I = 


/ttU”- 


Hence, by comparison with the table of standard integrals (see subsection 2.2.3) 


2.2.8 Integration by parts 


Integration by parts is the integration analogy of product differentiation. The 
principle is to break down a complicated function into two functions, at least one 
of which can be integrated by inspection. The method in fact relies on the result 
for the differentiation of a product. Recalling from (2.6) that 


—(uv) = u 
dx 


dv 

dx 


du 

dx V ’ 


where u and v are functions of x, we now integrate to find 

/ dv f du 

u— dx + / — v dx. 
dx J dx 


Rearranging into the standard form for integration by parts gives 


/ 


dv 

u — dx = uv — 
dx 


du 

— v dx. 
dx 


(2.36) 


Integration by parts is often remembered for practical purposes in the form 
the integral of a product of two functions is equal to {the first times the integral of 
the second} minus the integral of { the derivative of the first times the integral of 
the second}. Here, u is ‘the first’ and dv/dx is ‘the second’; clearly the integral v 
of ‘the second’ must be determinable by inspection. 


► Evaluate the integral I = f x sin x dx. 


In the notation given above, we identify x with u and sinx with dv/dx. Hence v = — cosx 
and du/dx = 1 and so using (2.36) 

I = x( — cos x) — [ (1)(— cos x) dx = — x cos x + sin x + c. ◄ 
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The separation of the functions is not always so apparent, as is illustrated by 
the following example. 


► Evaluate the integral I = f x 3 e dx. 

Firstly we rewrite the integral as 

Now, using the notation given above, we identify x 1 with u and xe~ x with dv/dx. Hence 
v = — \e~ xl and du/dx = 2x, so that 

A trick that is sometimes useful is to take T as one factor of the product, as 
is illustrated by the following example. 


► Evaluate the integral I = Jlnxdx. 


Firstly we rewrite the integral as 

I = j (In x) 1 dx. 

Now, using the notation above, we identify In x with u and 1 with dv/dx. Hence we have 
v = x and du/dx = \/x, and so 


/ = 



x dx = x In x — x + c. ◄ 


It is sometimes necessary to integrate by parts more than once. In doing so, 
we may occasionally re-encounter the original integral I. In such cases we can 
obtain a linear algebraic equation for I that can be solved to obtain its value. 


► Evaluate the integral I = f e ax cos bxdx. 


Integrating by parts, taking e ax as the first function, we find 


/ = e a 


sin bx 




sin bx 


dx. 


where, for convenience, we have omitted the constant of integration. Integrating by parts 
a second time, 


I = i 


sin bx 


— cos bx 

V 2 


+ 


J a ‘ 


— cos bx 
b 2 


dx. 


Notice that the integral on the RHS is just —cr/b 2 times the original integral /. Thus 


/ = e“ 


1 . 


sin bx + -pr cos bx — ~^I. 


b 2 


b 2 
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Rearranging this expression to obtain / explicitly and including the constant of integration 
we find 


/ = 


t -Ufo sin fox + a cos fox) + c. 

a 2 + fo 2 


(2.37) 


Another method of evaluating this integral, using the exponential of a complex number, 
is given in section 3.6. ◄ 


2.2.9 Reduction formulae 

Integration using reduction formulae is a process that involves first evaluating a 
simple integral and then, in stages, using it to find a more complicated integral. 


► Using integration by parts, find a relationship between I„ and I„_\ where 


L = f ( 

Jo 


/„ = / (1 — x 3 )" dx 


and n is any positive integer. Hence evaluate H = f ( ! ( I — x 3 ) 2 dx. 

Writing the integrand as a product and separating the integral into two we find 

I„ = f ( 1 — x 3 )(l — x 3 )" -1 dx 
Jo 

= f ( 1 - X 3 )"- 1 dx - [ x 3 (l - x 3 )"” 1 dx. 

Jo Jo 

The first term on the RHS is clearly i and so, writing the integrand in the second term 
on the RHS as a product, 

I n = I„-i — f (x).x 2 (l — x 3 )"- 1 dx. 

Jo 

Integrating by parts we find 


— I,i—i + 0 — — 1„ 

in 


which on rearranging gives 


In = 


3 n 


. I,i—i ■ 


3 n + 1 

We now have a relation connecting successive integrals. Hence, if we can evaluate / 0 , we 
can find fi, 1 2 etc. Evaluating /o is trivial: 


Hence 


/o = f (1 — x 3 ) 0 dx = [ dx = [x]J = 1. 
Jo Jo 


(3x1) 3 (3x2) 3 9 

r, = _) v 1 = _ r, = x - = 

1 (3 x 1) + 1 4’ - (3 x 2) + 1 4 14' 


Although the first few I„ could be evaluated by direct multiplication, this becomes tedious 
for integrals containing higher values of n; these are therefore best evaluated using the 
reduction formula. ◄ 
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2.2.10 Infinite and improper integrals 

The definition of an integral given previously does not allow for cases in which 
either of the limits of integration is infinite (an infinite integral) or for cases 
in which f(x) is infinite in some part of the range (an improper integral), e.g. 
/(x) = (2 — x)~ 1/4 near the point x = 2. Nevertheless, modification of the 
definition of an integral gives infinite and improper integrals each a meaning. 

In the case of an integral I = j'’ fix ) dx, the infinite integral, in which b tends 
to oo, is defined by 

rco rb 

1=1 f(x)dx= lim / f(x)dx = lim F(b) — F(a). 

Ja Ja 

As previously, F(x) is the indefinite integral of /(x) and lim^oo F(b) means the 
limit (or value) that F(b) approaches as b —* oo; it is evaluated after calculating 
the integral. The formal concept of a limit will be introduced in chapter 4. 



Integrating, we find F(x) = — I (x 2 + a 2 ) 1 + c and so 

1 = 1™ 2 (b 2 + a 2 ) ~ (fid) = fifi' < 


For the case of improper integrals, we adopt the approach of excluding the 
unbounded range from the integral. For example, if the integrand /(x) is infinite 
at x = c (say), a < c < b then 



Integrating directly, 

/ = lim [-f(2 - xH^ = lim [-^4] + = (|) fifi < 


2.2.11 Integration in plane polar coordinates 

In plane polar coordinates p,<f>, a curve is defined by its distance p from the 
origin as a function of the angle (j> between the line joining a point on the curve 
to the origin and the x-axis, i.e. p = p(f). The area of an element is given by 


71 





PRELIMINARY CALCULUS 



Figure 2.9 Finding the area of a sector OBC defined by the curve p(tj>) and 
the radii OB, OC, at angles to the x-axis i, cj > 2 respectively. 


dA = \p 2 d(f>, as illustrated in figure 2.9, and hence the total area between two 
angles <^1 and (f> 2 is given by 


A = 



(2.38) 


An immediate observation is that the area of a circle of radius a is given by 

A = f \a 2 dcj) = [i<a 2 0] q 71 = na 2 . 

Jo 


► The equation in polar coordinates of an ellipse with semi-axes a and b is 


1 cos 2 f sin 2 <j> 

p 2 a 2 b 2 


Find the area A of the ellipse. 

Using (2.38) and symmetry, we have 


A = i 


2 b 2 


2 Jo b 2 cos 2 (j) + a 2 sin 2 0 


df = 2 a 2 b 2 


pn/2 


1 


/ o b 2 cos 2 d> + a 2 sin" 


■ df. 


To evaluate this integral we write t = tan0 and use (2.35): 


A = 2 a 2 b 2 


1 


b 2 + a 2 t 2 


dt = 2b 2 


1 


'o (b/a) 2 + t 2 


dt. 


Finally, from the list of standard integrals (see subsection 2.2.3). 

1 t 


A = 2b~ 


( b/a ) 


tan 


(b/a) 


= 2 ab I — 


G-«) 


= nab. ◄ 
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2.2.12 Integral inequalities 

Consider the functions f(x), <j> i(x) and fai x) such that tf>i(x) < f(x) < (f> 2 (x) for 
all x in the range a < x < b. It immediately follows that 

pb pb nb 

/ cj)i(x)dx< / f(x)dx< / ( p 2 (x)dx , (2.39) 

J a J a J a 

which gives us a way of estimating an integral that is difficult to evaluate explicitly. 
► S/iow that the value of the integral 


I = 


1 


'o (l+X 2 +X 3 ) 1 /2 


dx 


lies between 0.810 and 0.882. 


We note that for x in the range 0 < x < 1, 0 < x 3 < x 2 . Hence 

( 1 + x 2 ) 1/2 <(l+x 2 + x 3 ) 1/2 < (1 + 2x 2 ) 1/2 , 

and so 


1 


> 


1 


> 


1 


(1+x 2 ) 1 / 2 (l+x 2 +x 3 )E 2 (1+2.X 2 ) 1 / 2 


Consequently, 


x2y/l dX ~J 0 (l +X 2 +x 3)l/2 dX ^ J Q {l + 2 X 2 y/2 dX ’ 


from which we obtain 


f'—t 

Jo (1 + > 

ain 

|^ln(x + \J 1 + x 2 )] > I 


> 


^ln(x+ V2+X 2 


0.8814 >1 >0.8105 
0.882 > / > 0.810. 

In the last line the calculated values have been rounded to three significant figures, 
one rounded up and the other rounded down so that the proved inequality cannot be 
unknowingly made invalid. ◄ 


2.2.13 Applications of integration 

Mean value of a function 

The mean value m of a function between two limits a and b is defined by 

m=- [ f(x)dx. (2.40) 

b-a J a 

The mean value may be thought of as the height of the rectangle that has the 
same area (over the same interval) as the area under the curve f(x). This is 
illustrated in figure 2.10. 
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fix 



m 

A 

H V 


a b x 

Figure 2.10 The mean value m of a function. 

► Find the mean value m of the function f(x) = x 2 between the limits x = 2 and x = 4. 


Using (2.40), 


m = 


1 

4-2 





28 

T' 


◄ 


Finding the length of a curve 

Finding the area between a curve and certain straight lines provides one example 
of the use of integration. Another is in finding the length of a curve. If a curve 
is defined by y = f(x) then the distance along the curve, As, that corresponds to 
small changes Ax and Ay in x and y is given by 

As « \/ (Ax) 2 + (Ay) 2 ; (2.41) 


this follows directly from Pythagoras’ theorem (see figure 2.11). Dividing (2.41) 
through by Ax and letting Ax — * 0 we obtain! 



Clearly the total length s of the curve between the points x = a and x — b is then 
given by integrating both sides of the equation: 


s = 




(2.42) 


4 Instead of considering small changes Ax and A y and letting these tend to zero, we could have 
derived (2.41) by considering infinitesimal changes dx and dy from the start. After writing (ds) 2 = 
(dx) 2 +(dy) 2 , (2.41) may be deduced by using the formal device of dividing through by dx. Although 
not mathematically rigorous, this method is often used and generally leads to the correct result. 
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Figure 2.11 The distance moved along a curve, As, corresponding to the 
small changes Ax and Ay. 


In plane polar coordinates, 


ds = \J (dr) 2 + (r deft) 2 



(2.43) 


► Find the length of the curve y = x 3 ^ 2 from x = 0 to x = 2. 


Using (2.42) and noting that dy/dx = |^x, the length s of the curve is given by 



Surfaces of revolution 

Consider the surface S formed by rotating the curve y = f(x) about the x-axis 
(see figure 2.12). The surface area of the ‘collar’ formed by rotating an element 
of the curve, ds, about the x-axis is 27iy ds, and hence the total surface area is 

( h 

S= 2ny ds. 


Since (ds) 2 = (dx) 2 + (dy) 2 from (2.41), the total surface area between the planes 
x = a and x = b is 


S = 





(2.44) 
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Figure 2.12 The surface and volume of revolution for the curve y = f(x). 


>-Find the surface area of a cone formed by rotating about the x-axis the line y = 2x 
between x = 0 and x = h. 


Using (2.44), the surface area is given by 
S = 


r~ 

d 1 

h 

-T 2x) 
dx 


dx 

rh 


l W2x { 1 

ph pit 

/ 47ix(l+2 2 ) 1 dx = 4y/5nxdx 
Jo Jo 

j^2^/57tx 2 j = 2^j5n(h 2 — 0) = 2f5nhr. 


We note that a surface of revolution may also be formed by rotating a line 
about the y-axis. In this case the surface area between y = a and y = b is 


S = 


2nx\ 1 + 



(2.45) 


Volumes of revolution 

The volume V enclosed by rotating the curve y = / (x) about the x-axis can also 
be found (see figure 2.12). The volume of the disc between x and x + dx is given 
by dV = ny 2 dx. Hence the total volume between x = a and x = b is 


V = 



(2.46) 
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► Find the volume of a cone enclosed by the surface formed by rotating about the x-axis 
the line y = 2x between x = 0 and x = h. 


Using (2.46), the volume is given by 


V = 


pli i'h 

/ n(2x) 2 dx = / 4nx 2 dx 

Jo Jo 


= [jTT.x 3 ]* = ^n(lf — 0 ) = ^ 71/7 3 . ◄ 


As before, it is also possible to form a volume of revolution by rotating a curve 
about the y-axis. In this case the volume enclosed between y = a and y = b is 


V = 



(2.47) 


2.3 Exercises 

2.1 Obtain the following derivatives from first principles: 

(a) the first derivative of 3x + 4; 

(b) the first, second and third derivatives of x 2 + x; 

(c) the first derivative of sinx. 

2.2 Find from first principles the first derivative of (x + 3) 2 and compare your answer 
with that obtained using the chain rule. 

2.3 Find the first derivatives of 

(a) x 2 expx, (b) 2 sin x cos x, (c) sin 2x, (d) x sin ax, 

(e) (exp ax)(sin ax) tan -1 ax, (f) ln(x“ + x~ a ), 

(g) In {a* + a- x ), (h) x*. 

2.4 Find the first derivatives of 

(a) x/(a + .x) 2 , (b) x/(l — x) 1/2 , (c) tanx, as sinx/cosx, 

(d) (3x 2 + 2x + l)/(8x 2 — 4x + 2). 

2.5 Use result (2.12) to find the first derivatives of 

(a) (2x + 3) -3 , (b) sec 2 x, (c) cosech 3 3x, (d) 1/lnx, (e) l/[sin -1 (x/a)]. 

2.6 Show that the function _y(x) = exp(— |x|) defined by 

{ exp x for x < 0, 

1 for x = 0, 

exp(— x) for x > 0, 

is not differentiable at x = 0. Consider the limiting process for both Ax > 0 and 
Ax < 0. 

2.7 Find dy /dx if x = (f — 2)/(f + 2) and y = 2 t/(t + 1) for —00 < t < 00 . Show that 
it is always non-negative, and make use of this result in sketching the curve of y 
as a function of x. 

2.8 If 2 y + sin v + 5 = x 4 + 4x 3 + 2n, show that dy/dx = 16 when x = 1. 

2.9 Find the second derivative of y(x) = cos [( 71 / 2 ) — ax]. Now set a = 1 and verify 

that the result is the same as that obtained by first setting a = 1 and simplifying 
y(x) before differentiating. 
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2.10 


2.11 


2.12 


2.13 

2.14 


2.15 

2.16 


2.17 


2.18 

2.19 

2.20 


The function y(x) is defined by y(x) = (1 + x m ) n . 


(a) Use the chain rule to show that the first derivative of y is nmx m 1 ( 1 + x m ) n 1 . 

(b) The binomial expansion (see section 1.5) of (1 + z) n is 


( 1 + z)" = 1 + nz + 


n(n~ 1) _ 2 + 


• + 


n(n — 1) • • • (n — r + 1) 

i 3 + 

r ! 


Keeping only the terms of zeroth and first order in dx, apply this result twice 
to derive result (a) from first principles. 

(c) Expand y in a series of powers of x before differentiating term by term. 
Show that the result is the series obtained by expanding the answer given 
for dy /dx in (a). 


Show by differentiation and substitution that the differential equation 
4x 2 ^ -4 x j[+ (4x 2 + 3)y = 0 

WA UA 

has a solution of the form y(x) = x n sin.x, and find the value of n. 

Find the positions and natures of the stationary points of the following functions: 

(a) x 3 — 3x + 3; (b) x 3 — 3x 2 + 3x; (c) x 3 + 3x + 3; 

(d) sin ax with a =/=■ 0; (e) x 5 + x 3 ; (f) x 5 — x 3 . 

Show that the lowest value taken by the function 3x 4 + 4x 3 — 12x 2 + 6 is —26. 
By finding their stationary points and examining their general forms, determine 
the range of values that each of the following functions y(x) can take. In each 
case make a sketch-graph incorporating the features you have identified. 

(a) y(x) = (x — l)/(x 2 + 2x + 6). 

(b) y(x) = 1/(4 + 3x — x 2 ). 

(c) y(x) = (8 sinx)/(15 + 8 tan 2 x). 

Show that y(x) = xa 2x exp x 2 has no stationary points other than x = 0, if 
exp(— ^2) < a < exp(V2). 

The curve 4y 3 = a 2 (x + 3y ) can be parameterised as x = a cos 3 9, y = a cos 9. 


(a) Obtain expressions for dy /dx (i) by implicit differentiation and (ii) in param- 
eterised form. Verify that they are equivalent. 

(b) Show that the only point of inflection occurs at the origin. Is it a stationary 
point of inflection? 

(c) Use the information gained in (a) and (b) to sketch the curve, paying 
particular attention to its shape near the points (—a, a/2) and (a,— a/2) and 
to its slope at the ‘end points’ (a, a) and (—a,— a). 

The parametric equations for the motion of a charged particle released from rest 
in electric and magnetic fields at right angles to each other take the forms 

x = a(9 — sind), y = a(l— cosd). 


Show that the tangent to the curve has slope cot(0/2). Use this result at a few 
calculated values of x and y to sketch the form of the particle’s trajectory. 

Show that the maximum curvature on the catenary y(x) = acosh(x/u) is 1/a. You 
will need some of the results about hyperbolic functions stated in subsection 3.7.6. 
The curve whose equation is x 2/3 + y 2/3 = a 2/3 for positive x and y and which 
is completed by its symmetric reflections in both axes is known as an astroid. 
Sketch it and show that its radius of curvature in the first quadrant is 3(axy) 1/3 . 
A two-dimensional coordinate system useful for orbit problems is the tangential- 
polar coordinate system (figure 2.13). In this system a curve is defined by r , the 
distance from a fixed point 0 to a general point P of the curve, and p, the 
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Figure 2.13 The coordinate system described in exercise 2.20. 


perpendicular distance from 0 to the tangent to the curve at P. By proceeding 
as indicated below, show that the radius of curvature at P can be written in the 
form p = r dr /dp. 

Consider two neighbouring points P and Q on the curve. The normals to the 
curve through those points meet at C, with (in the limit Q — > P) CP = CQ = p. 
Apply the cosine rule to triangles OPC and OQC to obtain two expressions for 
c 2 , one in terms of r and p and the other in terms of r + A r and p + A p. By 
equating them and letting Q — > P deduce the stated result. 

2.21 Use Leibniz’ theorem to find 

( a) the second derivative of cos x sin 2x, 

(b) the third derivative of sin x In x, 

(c) the fourth derivative of (2x 3 + 3x 2 + x + 2) exp 2x. 

2.22 If y = exp(— x 2 ), show that dy/dx = —2 xy and hence, by applying Leibniz' 
theorem, prove that for n > 1 

y (n+1) + 2 xy w + 2 ny in - l) = 0. 

2.23 (a) By considering its properties near x = 1, show that f(x) = 5x 4 — llx 3 + 

26x 2 — 44x + 24 takes negative values for some range of x. 

(b) Show that f(x) = tanx — x cannot be negative for 0 < x < n/2, and deduce 
that g(x) = x _1 sinx decreases monotonically in the same range. 

2.24 Determine what can be learned from applying Rolle’s theorem to the following 
functions f(x ): (a) e x ; (b) x 2 + 6x; (c) 2x 2 + 3x + 1; (d) 2x 2 + 3x + 2; (e) 
2x 3 — 2 lx 2 + 60x + k. (f) If k = —45 in (e), show that x = 3 is one root of 
f(x) = 0, find the other roots, and verify that the conclusions from (e) are 
satisfied. 

2.25 By applying Rolle’s theorem to x" sin nx, where n is an arbitrary positive integer, 
show that tan nx + x = 0 has a solution cq with 0 < oq < n/n. Apply the 
theorem a second time to obtain the nonsensical result that there is a real a 2 in 
0 < a .2 < n/n, such that cos 2 (na 2 ) = — n 2 . Explain why this incorrect result arises. 

2.26 Use the mean value theorem to establish bounds 

( a) for — ln( 1 — y ), by considering In x in the range 0<1— y<x<l, 

(b) for e y ■— 1, by considering e x — 1 in the range 0 < x < y. 
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2.27 


2.28 


2.29 

2.30 


2.31 


2.32 


2.33 

2.34 


2.35 

2.36 


2.37 


For the function y(x) = x 2 exp(— x) obtain a simple relationship between y and 
dy/dx and then, by applying Leibniz’ theorem, prove that 

xy (,l+1) + (n + x — 2 )v ,n) + = 0. 


Use Rolle’s theorem to deduce that if the equation f(x) = 0 has a repeated root 
xi then x\ is also a root of the equation /'(x) = 0. 


(a) Apply this result to the ‘standard" quadratic equation ax 2 + bx + c = 0, to 
show that the condition for equal roots is b 2 = 4 ac. 

(b) Find all the roots of f(x) = x 3 + 4x 2 — 3x — 18 = 0, given that one of them 
is a repeated root. 

(c) The equation f(x) = x 4 + 4x 3 +7x 2 + 6x + 2 = 0 has a repeated integer root. 
How many real roots does it have altogether? 

Show that the curve x 3 + y 3 — 12x — 8y— 16 = 0 touches the x-axis. 

Find the following indefinite integrals : 

(a) f(4 + x 2 ) -1 dx; (b) f (8 + 2x — x 2 R 1/2 dx for 2<x<4; 

(c) f(l + sinDR 1 dd; (d) f (x s /\ — x) _1 dx for 0 < x < 1. 

Find the indefinite integrals J of the following ratios of polynomials : 


(a) (x + 3)/(x 2 + x — 2); 

(b) (x 3 + 5x 2 + 8x + 12)/(2x 2 + lOx + 12); 

(c) (3x 2 + 20x + 28)/(x 2 + 6x + 9); 

(d) x 3 /(a 8 + x 8 ). 


Express x 2 (ax + b) 1 
hence evaluate 


as the sum of powers of x and another integrable term, and 



ax + b 


dx. 


Find the integral J of (ax 2 + bx + cR 1 , with a ^ 0, distinguishing between the 
cases (i) b 2 > 4 ac, (ii) b 2 < 4 ac, and (iii) b 2 = 4 ac. 

Use logarithmic integration to find the indefinite integrals J of the following: 

(a) sin2x/(l + 4sin 2 x); 

(b) e x /(e x — e~ x ); 

(c) (1 + x In x)/(x In x); 

(d) [x(x" + a”)] -1 - 

Find the derivative of f(x) = (1 + sinx)/ cosx and hence determine the indefinite 
integral J of sec x. 

Find the indefinite integrals J of the following functions involving sinusoids : 

(a) cos 5 x — cos 3 x; 

(b) (1 — cosx)/(l + cosx); 

(c) cos x sin x/( 1 + cosx); 

(d) sec 2 x/(l — tan 2 x). 

By making the substitution x = a cos 2 6 + b sin 2 6 , evaluate the definite integrals 
J between limits a and b (> a) of the following functions: 

(a) [(x — a)(b — x)] -1/2 ; 

(b) [(x — a)(b — x)] 1/2 ; 

(c) [(x — a)/(b — x)] 1 / 2 . 
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2.38 Determine whether the following integrals exist and, where they do, evaluate 
them : 

(a) / exp {-Xx)dx; (b) / , ' , , dx ; 

Jo J-o o {X ~r a ) 

r 00 i r r i 

(c) / — — r dx; (d) / -r dx; 

J l x + 1 Jo x 

{e) J o ' cotede- ( 0 1 j^yr 2 dx. 

Use integration by parts to evaluate the following: 

/ y /*y 

x 2 sin x dx; ( b ) / x In x dx ; 

/■y /\v 

(c) / sin 1 xdx; (d) / ln(« 2 + x 2 )/x 2 dx. 

Show, by each of the following methods, that the indefinite integral J of x 3 /(x + 
1) 1/2 is 

J = ^(5x 3 — 6x 2 + 8x — 16)(x + l) 1 ^ 2 + c. 

(a) by using repeated integration by parts. 

(b) by setting x + 1 = u 2 and determining dJ/du as (dJ/dx){dx/du). 

2.41 The gamma function T(u) is defined for all n > — 1 by 


2.39 


2.40 


E(n + 1) = / x"e x dx. 

Jo 

Find a recurrence relation connecting r(« + 1) and Tin). 

(a) Deduce (i) the value of T(n+ 1) when n is a non-negative integer and (ii) the 
value of r (|), given that Y (f) = ^Jn. 

(b) Now, taking factorial m for any m to be defined by ml = T{m +1), evaluate 

HK 

2.42 Define for non-negative integers m and n, by the integral 

t-n/2 

J(m,n)= / cos" 1 9 sin" 9 dd. 

Jo 

(a) Evaluate J(0,0), J(0,1), J(1,0), J(l,l), J(m,l), J(l,«). 

(b) Using integration by parts prove that, for m and n both > 0, 

ifi — 1 n — 1 

J(m,n)= J(m — 2,n) and J(m,n)= J(m,n — 2). 

m + n m + n 

(c) Evaluate (i) J(5,3), (ii) J(6,5), (iii) J(4,8). 

2.43 By integrating by parts twice, prove that /„ as defined in the first equality below 
for positive integers n has the value given in the second equality. 

r ' 2 . „ n — sin(n7i/2) 

I„= / smndcos6dd= ^ . 

Jo n 2 - 1 

2.44 Evaluate the following definite integrals: 

(a) f™xe~ x dx; (b) [(x 3 + l)/(x 4 + 4x + 1)] dx; 

(c) fo^ 2 [a -I- (a — l)cos0] -1 dd with a > \ ; (d) J^(x 2 + 6x + 18) _1 dx. 
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2.45 


2.46 


2.47 


2.48 

2.49 

2.50 


If J r is the integral 


show that 



x r exp(— x 2 ) dx 


(a) J 2r+1 = (r\)/2, 

(b) J 2r = 2~ r (2r — l)(2r — 3) • • • (5)(3)(1) J 0 - 

(a) Find positive constants a, b such that ax < sinx < bx for 0 < x < nil. Use 
this inequality to find (to two significant figures) upper and lower bounds 
for the integral 

nn/2 

I = / ( 1 + sin x) 1/2 dx. 

Jo 

(b) Use the substitution f = tan(x/2) to evaluate / exactly. 

By noting that for 0 < r] < 1, r/ 1/2 > r/ 3/4 > t/, prove that 

^ i/^W) 3 / 4 ^. 

Show that the total length of the astroid x 2/3 + y 2/3 = a 2/3 , which can be 
parameterised as x = a cos 3 9, y = a sin 3 9, is 6a. 

By noting that sinhx < \e x < coshx, and that 1 + z 2 < (1 + z ) 2 for z > 0, show 
that for x > 0, the length L of the curve y = \e x measured from the origin 
satisfies the inequalities sinh x < L < x + sinh x. 

The equation of a cardioid in plane polar coordinates is 


p = a(l — sin 4>). 

Sketch the curve and find (i) its area, (ii) its total length, (iii) the surface area of 
the solid formed by rotating the cardioid about its axis of symmetry and (iv) the 
volume of the same solid. 


2.4 Hints and answers 

2.1 (a) 3; (b) 2x + 1, 2, 0; (c) cosx. 

2.2 2x + 6. 

2.3 (a) (x 2 + 2x)expx; (b) 2(cos 2 x — sin 2 x) = 2cos2x; (c) 2cos2x; (d) sinax + 
ax cos ax; 

(e) (a exp ax) [(sin ax + cos ax) tan~* ax + (sinax)(l + a 2 x 2 ) -1 ]; 

(f) [a(x a — x~ a )\/[x(x a + x~ a )] ; (g) [(a x — ar x ) In a\/(a x + ar x ); (h) (1 + lnx)x x . 

2.4 (a) (a — x)(a + x) -3 ; (b) (1 — x/2)(l — x) _3/2 ; (c) sec 2 x; 

(d) (—lx 1 — x + 2)(4x 2 — 2x + 1)~ 2 . 

2.5 (a) — 6(2x + 3)~ 4 ; (b) 2sec 2 xtanx; (c) — 9cosech 3 3xcoth3x; 

(d) — x -1 (lnx) -2 ; (e) —(a 2 — x 2 )^ 1/2 [sin _1 (x/a)] -2 . 

2.6 The two limits are —1 (for Ax > 0) and +1 (for Ax < 0) and are not equal. 

2.7 (r + 2) 2 /[2(f+l) 2 ]. 

2.8 y = n at x = 1. 

2.9 — sin x in both cases. 

2.10 (b) Write 1 + (x + Ax) m as 1 +x"'(l +Ax/x) m ; (c) in the general terms of the two 
series, the indices r and s are related by r = s + 1. 

2.11 The required conditions are 8n — 4 = 0 and 4n 2 — 8n + 3 = 0; both are satisfied 
by n = f 
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Figure 2.14 The solutions to exercise 2.14. 


2.12 (a) Minimum at x = 1, maximum at x = — 1; (b) inflection at x = 1; (c) no 
stationary points; (d) x = (n + \)n/a , maximum for n even, minimum for n odd; 
(e) inflection at x = 0; (f) inflection at x = 0, maximum at x = — (| ) 1/2 , minimum 
at (|) 1/2 . 

2.13 —26 at x = —2; other stationary values are 6 at x = 0 and 1 at x = 1. 

2.14 See figure 2.14(a)-(c). 

(a) y(l) = 0; no infinities; minimum y(—2) = — maximum y( 4) = ^; — / < 


(b) No zeroes; y(— 1) = +oo, _y(4) = +oo; minimum y(|) = 4;y<0ory>A 

(c) Periodic with period 2n. Within 0 < x < n, symmetry about x = 7t/2. 
Within 0 < x < 2n, antisymmetry about x = n; zeroes at x = nn and 
x = (2m + l)7i/2; no infinities; other stationary points at x = cos _1 (+2/^7); 
|y| < 8/(7V2l). 

2.15 Use logarithmic differentiation. Set dy/dx = 0, obtaining 2x 2 + 2xlna +1=0. 

2.16 (a) (i) a 2 /(12y 2 — 3n 2 ), (ii) (12cos 2 0 — 3) _1 . (b) No, dy/dx = —1/3. (c) Vertical 
tangents when y = +a/ 2; dy/dx = 1/9 at y = +a. 

2.17 See figure 2.15. 

2.18 First show that p = y 2 /a. 

oio d y _ m 1/3 . d2 y _ fl2/3 

dx \x) ’ dx 2 3x 4 /3 y i/3- 

2.20 For example, OC 2 = p 2 + r 2 — 2 pp, where use has been made of the fact that 
r cos OP C = p. 

2.21 (a) 2(2 — 9 cos 2 x) sin x; (b) (2x~ 3 — 3x _1 )sinx — (3x~ 2 + lnx)cosx; (c) 8(4x 3 + 
30x 2 + 62.x + 38) exp 2.x. 
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y 



Figure 2.15 The solution to exercise 2.17. 


2.23 


2.24 


2.25 


2.26 

2.27 

2.28 


2.29 

2.30 

2.31 


(a) /(l) = 0 whilst /'( 1)^=0 and so f(x) must be negative in some region with 
x = 1 as an endpoint. 

(b) f'(x) = tan 2 x > 0 and /( 0 ) = 0 ; g'(x) = (— cosx)(tanx — x)/x 2 , which is 
never positive in the range. 

(a) Any two consecutive roots of e x = 0 have another root of e x = 0 lying between 
them; thus there is at most one root of e x = 0 (formally — oo). (b) The root of 
2x + 6 = 0 lies in the range — 6 < x < 0. (c) Any roots of f(x) = 0 (actually —1 
and — i) lie on either side of x = — j. (d) As in (c), but there are no real roots. 
More generally, if there are two values of x that give 2x 2 + 3x + k equal values 
then they lie one on each side of x = — |. (e) f'(x) = 6x 2 — 42x + 60 = 0 has roots 
2 and 5. Therefore, if f(x) = 0 has three real roots oc, then oq < 2 < oq < 5 < 0 C 3 . 
(f) The other roots are ^(15 + ^/105). 

The false result arises because tan nx is not differentiable at x = n/(2n), which 
lies in the range 0 < x < n/n, and so the conditions for applying Rolle’s theorem 
are not satisfied. 

(a) y < — ln(l — y) < y/( 1 — y); (b) y < e y — 1 < ye y . 
xdy/dx = (2 — x)y. 

(a) Show that x = —b/(2a). 

(b) Possible repeated roots are —3 and only —3 satisfies /(x) = 0. Factorise 
/(x) as (x + 3) 2 (x — b), giving b = 2 and x = 2 as the third root. 

(c) /'(x) = 0 has the integer solution x = —1 (by inspection); f(x) factorises as 
the product (x+l) 2 (x 2 + 2 x+ 2 ) and hence /(x) = 0 has only two (coincident) 
real roots. 

By implicit differentiation, y'(x) = (3x 2 — 12)/(8 — 3y 2 ), giving y'(+2) = 0. Since 
y( 2) = 4 and y(— 2) = 0, the curve touches the x-axis at the point (—2,0). 

(a) [tan _ 1 (x/2)]/2; (b) sin _1 [(x — l)/3] ; (c) — 2[1 + tan(0/2)] _1 ; (d) put y = 
(1 - x)>/ 2 , In {[1 - (1 - x)‘/ 2 ]/[l + (1 - x) 1 / 2 ]}. 

(a) Express in partial fractions; J = j ln[(x — 1 ) 4 /(x + 2)] + c. 

(b) Divide the numerator by the denominator and express the remainder in 
partial fractions; J = x 2 /4 + 41n(x + 2) — 3 ln(x + 3) + c. 

(c) After division of the numerator by the denominator the remainder can be 
expressed as 2(x + 3R 1 — 5(x + 3)~ 2 ; J = 3x + 2 ln(x + 3) + 5(x + 3 ) _1 + c. 

(d) Set x 4 = u; J = (4a 4 ) -1 tan _1 (x 4 /a 4 ) + c. 

Express as (x/a) — ( b/a 2 ) + (b/a) 2 (ax + b ) _1 ; (h 2 /a 3 )( ln2 — |). 
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2.33 Writing b 2 — 4 ac as A 2 > 0, or 4 ac — b 2 as A' 2 >0: 

(i) A -1 ln[(2ax + b — A)/(2 ax + b + A)] + k; 

(ii) 2A' _1 tan _1 [(2ax + b)/ A'] + fe; 

(iii) — 2(2n.x + b)~ l + k. 

2.34 (a) J = i ln(l + 4 sin 2 x) + c. 

(b) Multiply numerator and denominator by e x ; J = \ ln(e 2x — 1) + c. 

(c) First divide the numerator by the denominator. J = x + ln(ln x) + c. 

(d) Multiply numerator and denominator by x B_1 , and then set x" = it. 

J = (nci")- 1 ln[x"/(x" + a" )] + c. 

2.35 fix) = (1 + sin x)/ cos 2 x = f(x) sec x; J = ln(/(x)) + c = ln(secx + tanx) + c. 

2.36 (a) Show cos 4 x — cos 2 x = sin 4 x — sin 2 x\ J = \ sin 5 x — j sin 3 x + c. 

(b) Either write the numerator and denominator in terms of sinusoidal functions 
of x/2 or make the substitution t = tan(.x/2); J = 2 tan(x/2) — x + c. 

(c) Substitute t = tan(.x/2); J = 21n(cos(x/2)) — 2cos 2 (x/2 ) + c. 

(d) Either set tanx = u or show that the integrand is sec2x and use the result of 
exercise 2.35. J = \ ln(sec2x + tan2x) + c = ^ ln[(l + tanx)/(l — tanx)] + c. 

2.37 (a) n ; (b) n (b — a) 1 / 8; (c) n(b — a)/ 2. 

2.38 (a) Yes, for X > 0, value 2T 1 ; (b) yes, value 0; (c) no, ln(l + R) — > oo as R — > oo; 

(d) no, r 1 -M» as f -> 0; (e) no, ln(sin0) — * — oo as 6 — > 0; (f) yes, value 1. 

2.39 (a) (2 — y 2 ) cosy + 2ysiny - 2; (b) [(y 2 In y)/2] + [(1 - y 2 )/4] ; 

(c) y sin -1 y + (1 — y 2 ) 1/2 — 1; 

(d) ln(a 2 + 1) — (l/v)ln(a 2 + y 2 ) + (2/a)[tan _1 (y/a) — tan *(1 /a)]. 

2.40 (b) dJ/du = 2(u 2 — l) 3 . 

2.41 E(n + 1) = nF(n); (a) (i) n!, (ii) 15^7t/8; (b) —2^/n. 

2.42 (a) 7i / 2, 1, 1, 1/2, l/(m + 1), l/(n + 1). 

(b) Write the initial integrand as cos" 1-1 6 sin" 9 cos 9, and later rewrite sin" +2 9 
as sin" 0(1 — cos 2 6). 

(c) (i) 1/24, (ii) 8/693, (iii) 7?r/2048. 

2.44 (a) 1; (b) (In 6)/4; (c) { 2 tan 1 [(2« — 1 ) 1/2 ] } /(2a - 1) 1/2 ; (d) n/ 3. 

2.46 (a) a = 2 /ti, b = 1; \ [(1 + |) 3/2 - 1] >1 > f(2 3/2 -l), 2.08 > / > 1.91; (b) 1=2. 

2.47 Set r\ = 1 - (x/a) 2 . 

2.49 L = f/ (1 + f exp 2x) 1/2 dx. 

2.50 Note that to avoid any possible double counting, integrals should be taken from 
7i/2 to 3n/2 and symmetry used for scaling up. The integrands (and infinitesimals) 
should be as indicated, with p' denoting dp/d<f>: 

(i) (p 2 /2)dcj>, 3na 2 /2; (ii) 2(p' 2 + p 2 ) 1//2 dcj>, 8 a; 

(iii) 27ip cos 4>(p' 2 + P 2 ) 1 ^ 2 dtf), 32na 2 /5; 

(iv) 7ip 2 cos 2 </» rf(p sin </>), 87in 3 /3. 
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3 


Complex numbers and 
hyperbolic functions 


This chapter is concerned with the representation and manipulation of complex 
numbers. Complex numbers pervade this book, underscoring their wide appli- 
cation in the mathematics of the physical sciences. The application of complex 
numbers to the description of physical systems is left until later chapters and 
only the basic tools are presented here. 


3.1 The need for complex numbers 

Although complex numbers occur in many branches of mathematics, they arise 
most directly out of solving polynomial equations. We examine a specific quadratic 
equation as an example. 

Consider the quadratic equation 

z 2 — 4z + 5 = 0. (3.1) 

Equation (3.1) has two solutions, zj and z 2 , such that 

(z - zi)(z - z 2 ) = 0. (3.2) 


Using the familiar formula for the roots of a quadratic equation, (1.4), the 
solutions z i and Z 2 , written in brief as zi j2 , are 


^ 1,2 


4 + f (— 4) 2 — 4(1 x 5) 
2 


= 2 + 


2 


(3.3) 


Both solutions contain the square root of a negative number. However, it is not 
true to say that there are no solutions to the quadratic equation. The fundamental 
theorem of algebra states that a quadratic equation will always have two solutions 
and these are in fact given by (3.3). The second term on the RHS of (3.3) is 
called an imaginary term since it contains the square root of a negative number; 
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m 



Figure 3.1 The function f(z) = z 2 — Az + 5. 


the first term is called a real term. The full solution is the sum of a real term 
and an imaginary term and is called a complex number. A plot of the function 
/(z) = z 2 — 4z + 5 is shown in figure 3.1. It will be seen that the plot does not 
intersect the z-axis, corresponding to the fact that the equation /(z) = 0 has no 
purely real solutions. 

The choice of the symbol z for the quadratic variable was not arbitrary; the 
conventional representation of a complex number is z, where z is the sum of a 
real part x and i times an imaginary part y, i.e. 


z = x + iy, 


where i is used to denote the square root of —1. The real part x and the imaginary 
part y are usually denoted by Rez and Imz respectively. We note at this point 
that some physical scientists, engineers in particular, use j instead of i. However, 
for consistency, we will use i throughout this book. 

In our particular example, >/— 4 = 2 a/— T = 2 i, and hence the two solutions of 
(3.1) are 


Z 12 = 2 + — =2 + i. 
_ 2 - 


Thus here x = 2 and y = +1. 

For compactness a complex number is sometimes written in the form 


z = (x, y), 

where the components of z may be thought of as coordinates in an xy-plot. Such 
a plot is called an Argand diagram and is a common representation of complex 
numbers; an example is shown in figure 3.2. 
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Figure 3.2 The Argand diagram. 


Our particular example of a quadratic equation may be generalised readily to 
polynomials whose highest power (degree) is greater than 2, e.g. cubic equations 
(degree 3), quartic equations (degree 4) and so on. For a general polynomial f(z), 
of degree n, the fundamental theorem of algebra states that the equation f(z) = 0 
will have exactly n solutions. We will examine cases of higher-degree equations 
in subsection 3.4.3. 

The remainder of this chapter deals with: the algebra and manipulation of 
complex numbers; their polar representation, which has advantages in many 
circumstances ; complex exponentials and logarithms ; the use of complex numbers 
in finding the roots of polynomial equations; and hyperbolic functions. 


3.2 Manipulation of complex numbers 

This section considers basic complex number manipulation. Some analogy may 
be drawn with vector manipulation (see chapter 7) but this section stands alone 
as an introduction. 


3.2.1 Addition and subtraction 

The addition of two complex numbers, z\ and z 2 , in general gives another 
complex number. The real components and the imaginary components are added 
separately and in a like manner to the familiar addition of real numbers: 


z\ + z 2 — (xi + iy i) + (x 2 + iyi) — (-M + * 2 ) + i(y i + yi), 
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Figure 3.3 The addition of two complex numbers. 


or in component notation 

2l ++2 = (XIjTi) + (x 2 ,y 2 ) = Ul + *2,Tl +yi)- 

The Argand representation of the addition of two complex numbers is shown in 
figure 3.3. 

By straightforward application of the commutativity and associativity of the 
real and imaginary parts separately, we can show that the addition of complex 
numbers is itself commutative and associative, i.e. 

Zi + Z 2 = Z 2 + Zl, 

Z\ + (z 2 + Z 3) = (z 1 + Z 2) + Z3. 

Thus it is immaterial in what order complex numbers are added. 


►Sum the complex numbers 1 + 2 i, 3 — 4 i, —2 + i. 


Summing the real terms we obtain 

1 + 3 - 2 = 2, 

and summing the imaginary terms we obtain 

2 i — 4 / + i = —i. 


Hence 

(1+2/) + (3-4/) + (-2 + i) = 2 — i. ◄ 


The subtraction of complex numbers is very similar to their addition. As in the 
case of real numbers, if two identical complex numbers are subtracted then the 
result is zero. 
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Figure 3.4 The modulus and argument of a complex number. 


3.2.2 Modulus and argument 

The modulus of the complex number z is denoted by |z| and is defined as 

\z\ = \J x 2 + y 2 . (3.4) 

Hence the modulus of the complex number is the distance of the corresponding 
point from the origin in the Argand diagram, as may be seen in figure 3.4. 

The argument of the complex number z is denoted by arg z and is defined as 

arg z = tan -1 . (3.5) 

Thus argz is the angle that the line joining the origin to z on the Argand diagram 
makes with the positive x-axis. The anticlockwise direction is taken to be positive 
by convention. The angle arg z is shown in figure 3.4. Account must be taken 
of the signs of x and y individually in determining in which quadrant arg z lies. 
Thus, for example, if x and y are both negative then arg z lies in the range 
—n < arg z < —n/2 rather than in the first quadrant (0 < arg z < n/2), though 
both cases give the same value for the ratio of y to x. 


►Find the modulus and the argument of the complex number z = 2 — 3 i. 


Using (3.4), the modulus is given by 

|z| = V 22 + (-3)2 = V^- 
Using (3.5), the argument is given by 

arg z = tan -1 (— I) • 

The two angles whose tangents equal —1.5 are —0.9828 rad and 2.1588 rad. Since x = 2 and 
y = —3, z clearly lies in the fourth quadrant; therefore arg z = —0.9828 is the appropriate 
answer. ◄ 
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3.2.3 Multiplication 

Complex numbers may be multiplied together and in general give a complex 
number as the result. The product of two complex numbers z\ and z 2 is found 
by multiplying them out in full and remembering that r = — 1, i.e. 

ZiZi = (xi + iy i)(x 2 + iyi) 

= x\X2 + ixiy 2 + iyixz + ryiy 2 
= (X 1 X 2 - yiy 2 ) + i(xiy 2 + y\x 2 ). (3.6) 


► Multiply the complex numbers z\ = 3 + 2 i and z 2 = — 1 — 4/. 

By direct multiplication we find 


ziz 2 = (3 + 2i)(-\ - 4/) 


do 

1 

Fi 

1 

r5 

1 

m 

II 


= 5 - 14/. ◄ 

(3.7) 

The multiplication of complex numbers is both commutative and 
i.e. 

associative, 

ZlZ2 = Z 2 Zl, 

(3.8) 

(z 1 z 2 )z 3 = z 1 (z 2 z 3 ). 

(3.9) 

The product of two complex numbers also has the simple properties 


IZ 1 Z 2 I = |zi||z 2 [, 

(3.10) 

arg(ziz 2 ) = arg zi + arg z 2 . 

(3.11) 

These relations are derived in subsection 3.3.1. 


► Verify that (3.10) holds for the product of z\ = 3 + 2/ and z 2 = — 1 — 4/. 


From (3.7) 

|ziz 2 | = |5 - 14/ 1 = V5 2 + (-14) 2 = V221. 

We also find 

N = a/ 3 2 + 2 2 = Vl3, 

N = V(-l) 2 + ( — 4) 2 = VT7, 

and hence 

|z 1 ||z 2 | = Vl3Vl7 = V221 = | z ]_ z 2 1 . ◄ 

We now examine the elfect on a complex number z of multiplying it by +1 
and +/'. These four multipliers have modulus unity and we can see immediately 
from (3.10) that multiplying z by another complex number of unit modulus gives 
a product with the same modulus as z. We can also see from (3.11) that if 
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Figure 3.5 Multiplication of a complex number by +1 and +i. 


we multiply z by a complex number, the argument of the product is the sum 
of the argument of z and the argument of the multiplier. Hence multiplying 
z by unity (which has argument zero) leaves z unchanged in both modulus 
and argument, i.e. z is completely unaltered by the operation. Multiplying by 
— 1 (which has argument n) leads to rotation, through an angle n, of the line 
joining the origin to z in the Argand diagram. Similarly, multiplication by i or 
—i lead to corresponding rotations of n/2 or —n/2 respectively. This geometrical 
interpretation of multiplication is shown in figure 3.5. 

► Using the geometrical interpretation of multiplication by i,fincl the product z(l — i). 

The complex number 1 — i has argument — n/4 and modulus -Jl. Thus, using (3.10) and 
(3.11), its product with i has argument +7 t/ 4 and unchanged modulus ~Jl. The complex 
number with modulus ^/2 and argument +7t/4 is 1 + i and so 

z(l — zj = 1 + z, 

as is easily verified by direct multiplication. ◄ 

The division of two complex numbers is similar to their multiplication but 
requires the notion of the complex conjugate (see the following subsection) and 
so discussion is postponed until subsection 3.2.5. 


3.2.4 Complex conjugate 

If z has the convenient form x + iy then the complex conjugate, denoted by z*, 
may be found simply by changing the sign of the imaginary part, i.e. if z = x + iy 
then z* = x — iy. More generally, we may define the complex conjugate of z as 
the (complex) number having the same magnitude as z that when multiplied by 
z leaves a real result, i.e. there is no imaginary component in the product. 
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Figure 3.6 The complex conjugate as a mirror image in the real axis. 


In the case where z can be written in the form x + iy it is easily verified, by 
direct multiplication of the components, that the product zz* gives a real result: 

zz = (x + iy)(x — iy) = x — ixy + ixy — i y = x + y = |z[ . 

Complex conjugation corresponds to a reflection of z in the real axis of the 
Argand diagram, as may be seen in figure 3.6. 


>-Find the complex conjugate of z = a + 2i + 2ib. 

The complex number is written in the standard form 

z = a + i( 2 + 3 b); 

then, replacing i by — i, we obtain 

z' = a — i( 2 + 3b). ◄ 

In some cases, however, it may not be simple to rearrange the expression for 
z into the standard form x + iy. Nevertheless, given two complex numbers, z i 
and Z 2 , it is straightforward to show that the complex conjugate of their sum 
(or difference) is equal to the sum (or difference) of their complex conjugates, i.e. 
(zi + z 2 )* = zj +Zj. Similarly, it may be shown that the complex conjugate of the 
product (or quotient) of z i and Z 2 is equal to the product (or quotient) of their 
complex conjugates, i.e. (ziz 2 )* = zjz* and (zi/zff = zj/zj. 

Using these results, it can be deduced that, no matter how complicated the 
expression, its complex conjugate may always be found by replacing every i by 
—i. To apply this rule, however, we must always ensure that all complex parts are 
first written out in full, so that no Fs are hidden. 
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► Find the complex conjugate of the complex number z = w ,3v+2 “* where w = x + 5 i. 


Although we do not discuss complex powers until section 3.5, the simple rule given above 
still enables us to find the complex conjugate of z. 

In this case w itself contains real and imaginary components and so must be written 
out in full, i.e. 

z = w 3v+2 “ = (x + 5 i) 3y+2 “. 

Now we can replace each i by —i to obtain 

z* = (x - 5i) (3 - v ~ 2 “ ) . 

It can be shown that the product zz’ is real, as required. ◄ 


The following properties of the complex conjugate are easily proved and others 
may be derived from them. If z = x + iy then 


(z*)* = z, 

z + z* = 2 Re z = 2x, 
z — z* = 2 i Imz = 2 iy, 
= ( x 2 -y 2 \ , 
z* \x 2 + y 2 J 


2xy 

x 2 + y 2 


(3.12) 

(3.13) 

(3.14) 

(3.15) 


The derivation of this last relation relies on the results of the following subsection. 


3.2.5 Division 


The division of two complex numbers z i and z 2 bears some similarity to their 
multiplication. Writing the quotient in component form we obtain 


zi _ xi + iy i 
z 2 X 2 + iyi ' 


(3.16) 


In order to separate the real and imaginary components of the quotient, we 
multiply both numerator and denominator by the complex conjugate of the 
denominator. By definition, this process will leave the denominator as a real 
quantity. Equation (3.16) gives 

zi _ (xi + iyi)(x 2 - iy 2 ) _ (xix 2 + yiy 2 ) + i(x 2 yi - x t y 2 ) 
z 2 (xi + iy 2 )(x 2 - m) x\ + y\ 

_ XjX 2 + yiy 2 . X 2 yi - xiy 2 
x 2 +yi ‘ x\ + y\ 


Hence we have separated the quotient into real and imaginary components, as 
required. 

In the special case where z 2 = zj, so that x 2 = x\ and y 2 = —y i, the general 
result reduces to (3.15). 
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► Express z in the form x + iy, when 

3-2 i 

Z ~ -1+4/' 


Multiplying numerator and denominator by the complex conjugate of the denominator 
we obtain 

(3 — 2/)(— 1 — 4/) _ -11-10/ 

Z " (— 1 + 4/)(— 1 — 4/) ~ 17 

11 10 

= — —i. ◄ 


In analogy to (3.10) and (3.11), which describe the multiplication of two 
complex numbers, the following relations apply to division: 


z t _ \zi\ 

Z2 |Z2|’ 


(3.17) 


arg 



= arg z i - arg z 2 . 


The proof of these relations is left until subsection 3.3.1. 


(3.18) 


3.3 Polar representation of complex numbers 

Although considering a complex number as the sum of a real and an imaginary 
part is often useful, sometimes the polar representation proves easier to manipulate. 
This makes use of the complex exponential function, which is defined by 

2 3 

e z = exp zsl+z+(7- + 7-H . (3.19) 

Strictly speaking it is the function expz that is defined by (3.19). The number e 
is the value of exp(l), i.e. it is just a number. However, it may be shown that e z 
and expz are equivalent when z is real and rational and mathematicians then 
define their equivalence for irrational and complex z. For the purposes of this 
book we will not concern ourselves further with this mathematical nicety but, 
rather, assume that (3.19) is valid for all z. We also note that, using (3.19), by 
multiplying together the appropriate series we may show that (see chapter 20) 

e z ' e zi = e zi+zi , (3.20) 

which is analogous to the familiar result for exponentials of real numbers. 
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Figure 3.7 The polar representation of a complex number. 


From (3.19), it immediately follows that for z = id, 6 real. 


w « , 0 2 W 3 

e = 1+/0 -2!-3 T 



°1 

4! 




and hence that 


e' 0 = cos 9 + i sin 9, 


(3.21) 

(3.22) 


(3.23) 


where the last equality follows from the series expansions of trigonometric func- 
tions (see subsection 4.6.3). This last relationship is called Euler’s equation. It also 
follows from (3.23) that 

e'" e = cos n9 + i sin nO 


for all n. From Euler’s equation (3.23) and using figure 3.7 we deduce that 

re' 0 = r(cos 9 + i sin 9) 

= x + iy. 

Thus a complex number may be represented in the polar form 

z = re ie . (3.24) 

Referring again to figure 3.7, we can identify r with |z| and 9 with arg z. The 
simplicity of the representation of the modulus and argument is one of the main 
reasons for using the polar representation. The angle 9 lies conventionally in the 
range — n < 9 < n, but, since rotation by 9 is the same as rotation by 2 nn + 9, 
where n is any integer, 

re w = r j (0+2nn \ 
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Figure 3.8 The multiplication of two complex numbers. In this case r x and 
r 2 are both greater than unity. 


The algebra of the polar representation is different from that of the real and 
imaginary component representation, though, of course, the results are identical. 
Some operations prove much easier in the polar representation, others much more 
complicated. The best representation for a particular problem must be determined 
by the manipulation required. 


3.3.1 Multiplication and division in polar form 

Multiplication and division in polar form are particularly simple. The product of 
2 1 = rie' e> and 22 = r 2 e ' 81 is given by 

Zl z 2 = ri e Wl r 2 e Wl 

= ri r 2 e iil>1+e2) . ( 3 . 25 ) 

The relations \z\z 2 \ — I21II22I and arg(z x z 2 ) = arg 2 1 + arg 22 follow immediately. 
An example of the multiplication of two complex numbers is shown in figure 3 . 8 . 


Division is equally simple in polar form; the quotient of z x and z 2 is given by 


21 = = *\m-e 2 ) 

2 2 r 2 e Wl r 2 


( 3 . 26 ) 


The relations \z x /z 2 \ = [21 |/|z 2 1 and arg(zi/z 2 ) = arg z x — arg 2 2 are again imme- 
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Figure 3.9 The division of two complex numbers. As in the previous figure, 
n and ri are both greater than unity. 


diately apparent. The division of two complex numbers in polar form is shown 
in figure 3.9. 


3.4 de Moivre’s theorem 

We now derive an extremely important theorem. Since ( e' 8 )” = e mB , we have 

(cos 6 + i sin 6) n = cos nO + i sin nO, (3.27) 

where the identity e m 6 = cos n6 + i sin nO follows from the series definition of 
e mB (see (3.21)). This result is called de Moivre’s theorem and is often used in the 
manipulation of complex numbers. The theorem is valid for all n whether real, 
imaginary or complex. 

There are numerous applications of de Moivre’s theorem but this section 
examines just three: proofs of trigonometric identities; finding the nth roots of 
unity; and solving complex equations. 


3.4.1 Trigonometric identities 

The use of de Moivre’s theorem in finding trigonometric identities is best illus- 
trated by example. We consider the expression of a multiple-angle function in 
terms of a polynomial in the single-angle function, and its converse. 
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► Express sin 30 and cos 3 0 in terms of powers of cos 6 and sin0. 


Using de Moivre’s theorem, 

cos 30 + i sin 30 = (cos 0 + i sin 0) 3 

= (cos 3 0 — 3 cos 0 sin 2 0) + i(3 sin 0 cos 2 0 — sin 3 0). 

We can equate the real and imaginary coefficients separately, i.e. 

cos 30 = cos 3 0—3 cos 0 sin 2 0 
= 4 cos 3 0 — 3 cos 0 

and 

sin 30 = 3 sin 0 cos 2 0 — sin 3 0 
= 3 sin 0 — 4 sin 3 0. ◄ 


(3.28) 


(3.29) 


This method can clearly be applied to finding power expansions of cos nO and 
sin nO for any positive integer n. 

The converse process uses the following properties of z = e' 6 , 

z" + = 2 cos nd, (3.30) 

z" — — = 2isin»0. (3.31) 

z" 

These equalities follow from simple applications of de Moivre’s theorem, i.e. 

z" T = (cos 9 + /' sin 9) n + (cos 9 + i sin 9)~ n 

= cos nO + i sin nO + cos (—nd) + i sin(— nO) 

= cos nO + i sin nO + cos nO — i sin nd 
= 2 cos nO 


and 


z" — ^ = (cos 9 + i sin 9) n — (cos 9 + i sin 9) " 
= cos n9 + i sin n9 — cos n9 + i sin n9 
= 2 i sin n9. 


In the particular case where n= 1, 


z + - = e' e + e ,e = 2 cos 9, 
z 

z — - = e' e — e~' e = 2 i sin 9. 
z 


(3.32) 

(3.33) 
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►Find an expression for cos 3 8 in terms of cos 38 and cos 8. 
Using (3.32), 

cos 3 8 = 


Now using (3.30) and (3.32), we find 

cos 3 8=1 cos 30 + | cos 8. ◄ 

This result happens to be a simple rearrangement of (3.29), but cases involving 
larger values of n are better handled using this direct method than by rearranging 
polynomial expansions of multiple-angle functions. 



3.4.2 Finding the nth roots of unity 

The equation z 2 = 1 has the familiar solutions z = +1. However, now that 
we have introduced the concept of complex numbers we can solve the general 
equation z" = 1. Recalling the fundamental theorem of algebra, we know that 
the equation has n solutions. In order to proceed we rewrite the equation as 

n 2ikn 

z = e , 

where k is any integer. Now taking the nth root of each side of the equation we 
find 

z = e 2ikn/n . 


Hence, the solutions of z" = 1 are 


U,2... .,n 1 ? 


flin/n 


g2i(n—l)n/n 


corresponding to the values 0, 1, 2 1 for k. Larger integer values of k do 
not give new solutions, since the roots already listed are simply cyclically repeated 
for k = n, n + 1, n + 2, etc. 


►Find the solutions to the equation z 3 = 1 . 


By applying the above method we find 

z = e 2ikn/i . 

Hence the three solutions are z t = e°‘ = 1, z 2 = e lm/i , z 3 = e 4 “ /3 . We note that, as expected, 
the next solution, for which k = 3, gives Z 4 = e 6 ™ 23 = 1 = zi, so that there are only three 
separate solutions. ◄ 
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Figure 3.10 The solutions of z 3 = 1. 


Not surprisingly, given that |z 3 | = |z| 3 from (3.10), all the roots of unity have 
unit modulus, i.e. they all lie on a circle in the Argand diagram of unit radius. 
The three roots are shown in figure 3.10. 

The cube roots of unity are often written 1, co and co 2 . The properties m 3 = 1 
and 1 + co + co 2 = 0 are easily proved. 


3.4.3 Solving polynomial equations 

A third application of de Moivre’s theorem is to the solution of polynomial 
equations. Complex equations in the form of a polynomial relationship must first 
be solved for z in a similar fashion to the method for finding the roots of real 
polynomial equations. Then the complex roots of z may be found. 


>-Solve the equation z 6 — z 5 + 4z 4 — 6z 3 + 2z 2 — 8z + 8 = 0. 


We first factorise to give 


(z 3 -2)(z 2 + 4)(z- 1) = 0. 

Hence z 3 = 2 or z 2 = —4 or z = 1. The solutions to the quadratic equation are z = +2i; 
to find the complex cube roots, we first write the equation in the form 

z 3 = 2 = 2e m% , 

where k is any integer. If we now take the cube root, we get 

z = 2 1/3 e m%/3 . 


101 




COMPLEX NUMBERS AND HYPERBOLIC FUNCTIONS 


To avoid the duplication of solutions, we use the fact that — n < arg z < n and find 

zi = 2‘/ 3 , 
z 2 = 2 1/3 e 2 ”' /3 = 2 1/3 

z 3 = = 2 1 / 3 

The complex numbers zi, 22 and z 3 , together with Z4 = 2 i, zs = —2 i and = 1 are the 
solutions to the original polynomial equation. 

As expected from the fundamental theorem of algebra, we find that the total number 
of complex roots (six, in this case) is equal to the largest power of z in the polynomial. ◄ 

A useful result is that the roots of a polynomial with real coefficients occur in 
conjugate pairs (i.e. if z\ is a root, then z* is a second distinct root, unless Z| is 
real). This may be proved as follows. Let the polynomial equation of which z is 
a root be 

a„z n + a„_iz" _1 + • • • + fliz + no = 0. 

Taking the complex conjugate of this equation, 

a n (z) n + a n _ x (z*) n ~ l H b a\z* + a 0 = 0. 

But the a n are real, and so z* satisfies 

a„(z*)" + u n _i(z*) n— 1 H b aiz* + a 0 = 0, 

and is also a root of the original equation. 

3.5 Complex logarithms and complex powers 

The concept of a complex exponential has already been introduced in section 3.3, 
where it was assumed that the definition of an exponential as a series was valid 
for complex numbers as well as for real numbers. Similarly we can define the 
logarithm of a complex number and we can use complex numbers as exponents. 

Let us denote the natural logarithm of a complex number z by w = Ln z, where 
the notation Ln will be explained shortly. Thus, w must satisfy 



Using (3.20), we see that 


ziz 2 = e Wl e W2 = e m+W2 , 


and taking logarithms of both sides we find 

Ln(ziz 2 ) = Wi + w 2 = Lnzi + Lnz 2 , 


(3.34) 


which shows that the familiar rule for the logarithm of the product of two real 
numbers also holds for complex numbers. 
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We may use (3.34) to investigate further the properties of Lnz. We have already 
noted that the argument of a complex number is multivalued, i.e. arg z = 9 + 2 nn, 
where n is any integer. Thus, in polar form, the complex number z should strictly 
be written as 

z — re HS+2nn) 

Taking the logarithm of both sides, and using (3.34), we find 

Lnz = lnr + i(6 + 2nn), (3.35) 

where lnr is the natural logarithm of the real positive quantity r and so is 
written normally. Thus from (3.35) we see that Lnz is itself multivalued. To avoid 
this multivalued behaviour it is conventional to define another function lnz, the 
principal value of Lnz, which is obtained from Lnz by restricting the argument 
of z to lie in the range —n < 9 < n. 

► Evaluate Ln(— i). 

By rewriting — i as a complex exponential, we find 

Ln ( — i) = Ln [ e *(-”/ 2 + 2 “|j = i(- n /2 + 2nn), 

where n is any integer. Hence Ln(— i) = — in/2, 3in/2, We note that ln(— i), the 

principal value of Ln(— /), is given by ln( — i) = — in/2 . ◄ 

If z and t are both complex numbers then the zth power of t is defined by 

f = e zLnt . 

Since Ln t is multivalued, so too is this definition. 

► Simplify the expression z = i~ 2 ’. 

Firstly we take the logarithm of both sides of the equation to give 

Ln z = —2 i Ln i. 

Now inverting the process we find 

gLnr _ z _ 2iLn i 

We can write i = e '(*/ 2 + 2 »") ) where n is any integer, and hence 

Ln i = Ln |V(*/ 2 + 2 «)j 
= i (n/2 + 2nn) . 

We can now simplify z to give 

j 2i ^ 2ix,(7i/2-2h,i) 

_ g(n+4 nn) 

which, perhaps surprisingly, is a real quantity rather than a complex one. ◄ 

Complex powers and the logarithms of complex numbers are discussed further 
in chapter 20. 
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3.6 Applications to differentiation and integration 

We can use the exponential form of a complex number together with de Moivre’s 
theorem (see section 3.4) to simplify the differentiation of trigonometric functions. 

►Find the derivative with respect to x of e 3x cos 4x. 

We could differentiate this function straightforwardly using the product rule (see subsec- 
tion 2.1.2). However, an alternative method in this case is to use a complex exponential. 
Let us consider the complex number 

z = e 3x (cos4x + i sin 4x) = e 3 V“ = <? (3+4,> , 

where we have used de Moivre's theorem to rewrite the trigonometric functions as a com- 
plex exponential. This complex number has e 3x cos 4.x as its real part. Now, differentiating 
z with respect to x we obtain 

dz 

— = (3 + 4i)e (3+4 °* = (3 + 4i)e 3,: (cos 4.x + i sin 4x), (3.36) 

ax 

where we have again used de Moivre’s theorem. Equating real parts we then find 

(e 3 * cos4x) = e 3x (3cos4.x — 4sin4.x). 
dx v ' 

By equating the imaginary parts of (3.36), we also obtain, as a bonus, 

(e 3x sin 4.x) = e 3x (4cos4x + 3 sin 4.x). ◄ 
dx v ’ 


In a similar way the complex exponential can be used to evaluate integrals 
containing trigonometric and exponential functions. 


► Evaluate the integral I = f e“ x cos bxdx. 

Let us consider the integrand as the real part of the complex number 
e“(cos bx + i sin bx) = e ax e ibx = e {a+ih)x , 

where we use de Moivre’s theorem to rewrite the trigonometric functions as a complex 
exponential. Integrating we find 


/ 


e (a+ib)x dx = 


Ja+ib)x 


+ C 


a + ib 
(a - ib)e (a+mx 
( a — ib)(a + ib) 


+ c 


(ae ibx - ibe ihx ) + c, 


(3.37) 


a 2 + b 2 

where the constant of integration c. is in general complex. Denoting this constant by 
c = ci + ic 2 and equating real parts in (3.37) we obtain 


/ 


I = e ax cos bx dx = 


' + b 2 


(a cos bx + b sin bx) + ci, 


which agrees with result (2.37) found using integration by parts. Equating imaginary parts 
in (3.37) we obtain, as a bonus, 


I 


J= e ax sin bx dx = 


+ b 2 


(a sin bx — b cos bx) + C 2 ■ ◄ 
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3.7 Hyperbolic functions 

The hyperbolic functions are the complex analogues of the trigonometric functions. 
The analogy may not be immediately apparent and their definitions may appear 
at first to be somewhat arbitrary. However, careful examination of their properties 
reveals the purpose of the definitions. For instance, their close relationship with 
the trigonometric functions, both in their identities and in their calculus, means 
that many of the familiar properties of trigonometric functions can also be applied 
to the hyperbolic functions. Further, hyperbolic functions occur regularly, and so 
giving them special names is a notational convenience. 


3.7.1 Definitions 

The two fundamental hyperbolic functions are cosh x and sinh x, which, as their 
names suggest, are the hyperbolic equivalents of cos x and sin x. They are defined 
by the following relations: 

coshx = j(e x + e~ x ), (3.38) 

sinh x = \(e x - e~ x ). (3.39) 

Note that coshx is an even function and sinhx is an odd function. By analogy 
with the trigonometric functions, the remaining hyperbolic functions are 


tanh x = 

sinhx 

e x — e x 

(3.40) 

cosh x 

~ e x +e~ x ’ 

sech x = 

1 

2 

(3.41) 

cosh x 

~ e x + e- x ’ 

cosech x = 

1 

2 

(3.42) 

sinh x 

1 

~ e x — e~ x ’ 
e x + e~ x 

coth x = 

tanh x 

~ e x — e~ x ' 

(3.43) 


All the hyperbolic functions above have been defined in terms of the real variable 
x. However, this was simply so that they may be plotted (see figures 3.11-3.13); 
the definitions are equally valid for any complex number z. 


3.7.2 Hyperbolic-trigonometric analogies 

In the previous subsections we have alluded to the analogy between trigonometric 
and hyperbolic functions. Here, we discuss the close relationship between the two 
groups of functions. 

Recalling (3.32) and (3.33) we find 

cos ix = \(e x + e~ x ), 
sin ix = \ i(e x — e~ x ). 
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Figure 3.11 Graphs of coshx and sechx. 



Figure 3.12 Graphs of sinhx and cosechx. 


Hence, by the definitions given in the previous subsection. 


cosh x = cos Lx, (3.44) 

i sinh x = sin ix, ( 3.45 ) 

cos x = cosh ix, (3.46) 

i sin x = sinh ix. ( 3.47) 


These useful equations make the relationship between hyperbolic and trigono- 
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Figure 3.13 Graphs of tanhx and cothx. 


metric functions transparent. The similarity in their calculus is discussed further 
in subsection 3.7.6. 


3.7.3 Identities of hyperbolic functions 

The analogies between trigonometric functions and hyperbolic functions having 
been established, we should not be surprised that all the trigonometric identities 
also hold for hyperbolic functions, with the following modification. Wherever 
sin 2 x occurs it must be replaced by — sinlr x, and vice versa. Note that this 
replacement is necessary even if the sin 2 x is hidden, e.g. tan 2 x = sin 2 x/ cos 2 x 
and so must be replaced by (— sinh 2 x/ cosh 2 x) = — tanh 2 x. 


► Find the hyperbolic identity analogous to cos 2 x + sin 2 x = 1. 

Using the rules stated above cos 2 x must be replaced by cosh 2 x, and sin 2 x must be replaced 
by — sinh 2 x, and so the identity becomes 

cosh 2 x — sinh 2 x = 1. 

This can be verified by direct substitution, using the definitions of coshx and sinhx; see 
(3.38) and (3.39). ◄ 

Some other identities that can be proved in a similar way are 


sech 2 x = 1 — tanh 2 x, (3.48) 

coseclrx = coth 2 x — 1, (3.49) 

sinh 2x = 2 sinh x cosh x, (3.50) 

cosh 2x = cosh 2 x + sinh 2 x. (3.51) 
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3.7.4 Solving hyperbolic equations 

When we are presented with a hyperbolic equation to solve, we may proceed 
by analogy with the solution of trigonometric equations. However, it is almost 
always easier to express the equation directly in terms of exponentials. 


►So/re the hyperbolic equation cosh x — 5 sinhx — 5 = 0. 


Substituting the definitions of the hyperbolic functions we obtain 

|(e* + e~ x ) — |(e x — e~ x ) — 5 = 0. 
Rearranging, and then multiplying through by — e x , gives in turn 

-2e x + 3e -x -5 = 0 


and 

2e 2x + 5e x -3 = 0. 

Now we can factorise and solve: 


(2e x - l )(e x + 3) = 0. 

Thus e x = 1/2 or e x = —3. Hence x = — In 2 or x = ln(— 3). The interpretation of the 
logarithm of a negative number has been discussed in section 3.5. ◄ 


3.7.5 Inverses of hyperbolic functions 

Just like trigonometric functions, hyperbolic functions have inverses. If y = 
cosh x then x = cosh -1 y, which serves as a definition of the inverse. By using 
the fundamental definitions of hyperbolic functions, we can find closed-form 
expressions for their inverses. This is best illustrated by example. 


►Find a closed-form expression for the inverse hyperbolic function y = sinh 1 x. 


First we write x as a function of y, i.e. 

y = sinh -1 x => x = sinh y. 
Now, since coshy = \{e y + e~ y ) and sinh y = \{e y — e~ y ), 

e y = cosh y + sinh y 

= \j 1 + sinh 2 y + sinh y 


and hence 


e y = y/l + x 2 + x, 
y = ln(y/ 1 + x 2 + x). ◄ 


In a similar fashion it can be shown that 

cosh -1 x = In (\Jx 2 — 1 + x). 
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4 1 

2- 

\ sech l x 

Y 

cosh 1 x 


/k 

^^^^^^3^^ 4 x 

1 1 

4^ to 

/ sech -1 x 

cosh -1 x 


Figure 3.14 

Graphs of cosh -1 x and sech -1 x. 

►Find a closed-form expression foi 

" the inverse hyperbolic function y = tanh -1 x. 


First we write x as a function of y, i.e. 

y = tantT 1 x => x = tanhy. 
Now, using the definition of tanh y and rearranging, we find 

e v - e-v 


e v _|_ g-y 


(x + l)e~ y = (\-x)e y . 


Thus, it follows that 


Jy = 


1 + x 

1 — x 


y = In 


tanh 1 x = - In 
2 


1 + x 

1 — x’ 

1 + X 

1 — x’ 

1 + X 


1 — X 


Graphs of the inverse hyperbolic functions are given in figures 3.14-3.16. 


3.7.6 Calculus of hyperbolic functions 

Just as the identities of hyperbolic functions closely follow those of their trigono- 
metric counterparts, so their calculus is similar. The derivatives of the two basic 


109 




COMPLEX NUMBERS AND HYPERBOLIC FUNCTIONS 



Figure 3.15 Graphs of sinh 1 x and cosech 1 x. 



Figure 3.16 

Graphs of tanh 1 x and coth 1 x. 


hyperbolic functions are 


(coshx) = sinhx, 
ax 

(3.52) 


(sinhx) = coshx. 
ax 

(3.53) 


These may be deduced by considering the definitions. 
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► Verify the relation (d/dx) coshx = sinhx. 


Using the definition of cosh x, 


coshx = lj(e x + e x ). 


and differentiating directly, we find 

(coshx) = i(e x — e -x ) 
dx 

= sinh x. ◄ 


Clearly the integrals of the fundamental hyperbolic functions are also defined 
by these relations. The derivatives of the remaining hyperbolic functions can be 
derived by product differentiation and are presented below only for complete- 
ness. 


d_ 

dx 


— (tanhx) = sech 2 x, 
dx 

(sech x) = — sech x tanhx, 
dx 

(cosech x) = — cosech x coth x, 

d 7 

— (cothx) = — coseclrx. 
dx 


(3.54) 

(3.55) 

(3.56) 

(3.57) 


The inverse hyperbolic functions also have derivatives, which are given by the 
following : 


d 

(cosh 1 


dx * 

d 

( sinh -1 

dx 

V 

d 

— 1 

(tanh -1 

dx 

d 

((coth -1 

dx 


-) 

aJ 

-) 

a/ 

-) 

a / 

-) 

a / 


1 


^/x 2 — a 2 

1 


y/x 2 + a 2 

, — , , for x 2 < a 2 , 

a- — x- 

7 a 7 , for x 2 > a 2 . 
x l — a - 


(3.58) 

(3.59) 

(3.60) 

(3.61) 


These may be derived from the logarithmic form of the inverse (see subsec- 
tion 3.7.5). 
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► Evaluate (d/dx) sinh 1 .x using the logarithmic form of the inverse. 
From the results of section 3.7.5, 

A( sinh - 1 .x) = -^[ ln (x+y^T T )] 

X + ~Jx 2 + 1 ( yjx 2 + l) 

_ 1 / fx 2 + 1 + x\ 

X + yjx 2 + 1 y/x 2 + 1 ) 

1 

yjx 2 + 1 


3.1 


3.2 

3.3 

3.4 


3.5 


3.6 


3.7 


3.8 


3.8 Exercises 

Two complex numbers z and w are given by z = 3 + 4; and w = 2 — i. On an 
Argand diagram plot 

(a) z + w, (b) w — z, (c) wz, (d) z/w, 

(e) z’w + w’z, (f) w 2 , (g) lnz, (h) (1 + z + w) 1/2 . 

By considering the real and imaginary parts of the product e‘ e e'^ prove the 
standard formulae for cos (6 + ([>) and sin(0 + f). 

By writing tt/12 = (n/ 3) — (7t/4) and considering e m/n , evaluate cot(7i/12). 

Find the locus in the complex z-plane of points that satisfy the following equa- 
tions. 

^ , where c is complex, p is real and t is a real parameter 
that varies in the range — oo < t < oo. 

(b) z = a + bt + ct 2 , in which t is a real parameter and a, b , and c are complex 
numbers with b/c real. 

Evaluate 

(a) Re(exp2/z), (b) lm(cosh 2 z), (c) (— 1 + y/3 /) 1/2 , 

(d) | exp(i 1/2 )|, (e) exp(/ 3 ), (f) Im(2 /+3 ), (g) f, (h) ln[(V3 + i) 3 ]. 

Find the equations in terms of x and y of the sets of points in the Argand 
diagram that satisfy the following: 

(a) Re z 2 = 1m z 2 ; 

(b) (Imz 2 )/z 2 = — i; 

(c) arg[z/(z — 1)] = n/2. 

Show that the locus of all points z = x + iy in the complex plane that satisfy 
| z — ia| = A\z + ia \, A > 0, 

is a circle of radius |2A/(1 — A 2 )\a centred on the point z = ia[(l + A 2 )/( 1 — A 2 )}. 
Sketch the circles for a few typical values of A , including A < 1, A > 1 and A = 1. 
The two sets of points z = a, z = b, z = c, and z = A, z = B, z = C are 
the corners of two similar triangles in the Argand diagram. Express in terms of 
a,b,... , C 


(a) z — c = p 


1 + it 
1 — it 
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(a) the equalities of corresponding angles, and 

(b) the constant ratio of corresponding sides, 

in the two triangles. 

By noting that any complex quantity can be expressed as 

z = |z| exp(; argz), 

deduce that 

a(B - C) + b(C. - A) + c(A - B) = 0. 


3.9 


3.10 


3.11 


3.12 


For the real constant a find the loci of all points z = x + iy in the complex plane 
that satisfy 

(a) Re { ln (S)}= C ’ C> °’ 

(b) 1m | In | = k, 0<k<n/2. 

Identify the two families of curves and verify that in case (b) all curves pass 
through the two points +ia. 

The most general type of transformation between one Argand diagram, in the 
z-plane, and another, in the Z -plane, that gives one and only one value of Z for 
each value of z (and conversely) is known as the general bilinear transformation 
and takes the form 

aZ + b 
cZ + d 


(a) Confirm that the transformation from the Z-plane to the z-plane is also a 
general bilinear transformation. 

(b) Recalling that the equation of a circle can be written in the form 


Z — Z2 

show that the general bilinear transformation transforms circles into circles 
(or straight lines). What is the condition that zi, Z 2 and a must satisfy if the 
transformed circle is to be a straight line? 

Sketch the parts of the Argand diagram in which 

(a) Rez 2 < 0, |z 1/2 | < 2, 

(b) 0 < arg z' < rc/2, 

(c) | expz 3 | — * 0 as |z| — * oo. 

What is the area of the region in which all three conditions are satisfied? 
Denote the nth roots of unity by 1, co„, co 2 , ... , co "~ l . 

(a) Prove that 

n — 1 n — 1 

(i)5>; = o, (ii) l\K = (-ir +1 . 

r= 0 r= 0 

(b) Express .x 2 + y 2 + z 2 — yz — zx — xy as the product of two factors, each linear 
in x, y and z, with coefficients dependent on the third roots of unity (and 
those of the x terms arbitrarily taken as real). 
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3.13 


3.14 


3.15 


3.16 


3.17 


3.18 


Prove that x 2m+1 — a 2m+i , where m is an integer > 1, can be written as 

2nr 


= (x- a) 


x 2 — lax cos 


2m + 1 


+ a 


The complex position vectors of two parallel interacting equal fluid vortices 
moving with their axes of rotation always perpendicular to the z-plane are z\ 
and z 2 . The equations governing their motions are 

dz\ i dz\ i 

dt Z\ — Z2 ’ dt Z2 — z i 

Deduce that (a) z l + z 2 , (b) | z\ — z 2 \ and (c) |zi| 2 + \z 2 \ 2 are all constant in time, 
and hence describe the motion geometrically. 

Solve the equation 

z 7 — 4z 6 + 6z 5 — 6z 4 + 6z 3 — 12z 2 + 8z + 4 = 0, 

(a) by examining the effect of setting z 3 equal to 2, and 

(b) by factorising and using the binomial expansion of (z + a) 4 . 

Plot the seven roots of the equation on an Argand plot, exemplifying that complex 
roots of a polynomial equation always occur in conjugate pairs if the polynomial 
has real coefficients. 

The polynomial f(z) is defined by 

f(z) = z s — 6 z 4 + 15z 3 — 34z 2 + 36z — 48. 

(a) Show that the equation /(z) = 0 has roots of the form z = Xi where X is real, 
and hence factorize f(z). 

(b) Show further that the cubic factor of f(z) can be written in the form 
(z + a) 3 + b, where a and b are real, and hence solve the equation /(z) = 0 
completely. 

The binomial expansion of (1 + x) n , discussed in chapter 1, can be written for a 
positive integer n as 


(1 +x) n = Y^ n C r x r , 

r = 0 

where "C r = n\/[r\(n — r) !]. 

(a) Use de Moivre's theorem to show that the sum 

Si(n) = "C 0 - "C 2 + "C 4 + (-1)'" "C 2m , n — \ <2m < n, 

has the value 2 n/1 cos(nn/4). 

(b) Derive a similar result for the sum 

S 2 (n) = "Ci - "C 3 + "C 5 + (-1)'" "C 2m+ i, n - 1 < 2m + 1 < n, 

and verify it for the cases n = 6, 7 and 8. 

By considering ( 1 + exp id)", prove that 

n 

n C r cos n6 = 2 n cos”(0/2) cos(n6/2), 

r = 0 
n 

^^"C r sinf(0 = 2" cos"(0/2)sin(«0/2), 

r= 0 

where "C r = nl/[rl(n — r) !]. 
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3.19 


Use de Moivre’s theorem with n = 4 to prove that 

cos 40 = 8 cos 4 0 — 8 cos 2 0 + 1, 


and deduce that 


n 

cos - = 



1/2 


3.20 

3.21 

3.22 


3.23 

3.24 

3.25 


Express sin 4 0 entirely in terms of the trigonometric functions of multiple angles 
and deduce that its average value over a complete cycle is I. 

Use de Moivre’s theorem to prove that 


tan 50 = 


f 5 - 10r 3 + 5f 
5f 4 — 10f 2 + 1’ 


where t = tan0. Deduce the values of tan(;m/10) for n = 1, 2, 3, 4. 

(a) Prove that 

/ x + y \ 

cosh x — cosh y = 2 sinh I — — J sinh 

(b) Prove that, if y = sinh -1 x, 



(*• + 1 )^ + 4 , 0 . 

ax ax 

Determine the conditions under which the equation 


a cosh x + b sinh x = c, c > 0, 

has zero, one, or two real solutions for x. What is the solution if a 1 = c 2 + b 2 ? 

(a) Solve coshx = sinhx + 2sech x. 

(b) Show that the real solution x of tanhx = cosech x can be written in the 
form x = ln(i< + ^Ju). Find an explicit value for u. 

(c) Evaluate tanhx when x is the real solution of cosh2x = 2 coshx. 

Express sinh 4 x in terms of hyperbolic cosines of multiples of x, and hence solve 


2 cosh 4x — 8 cosh 2x + 5 = 0. 


3.26 In the theory of special relativity, the relationship between the position and time 
coordinates of an event as measured in two frames of reference that have parallel 
x-axes can be expressed in terms of hyperbolic functions. If the coordinates are x 
and r in one frame and x! and t' in the other then the relationship take the form 

x' = x cosh (f> — ct sinh <j>, 
ct' = —x sinh cj> + ct cosh 4>. 

Express x and ct in terms of x', ct' and <f) and show that 
x 2 — (ct) 2 = (x') 2 — (ct 1 ) 2 . 

3.27 A closed barrel has as its curved surface that obtained by rotating about the 
x-axis the part of the curve 

y = a[2 — cosh(x/o)] 

lying in the range —b<x< b. Show that the total surface area + of the barrel 
is given by 

+ = na[9a — 8aexp(—b/a) + aexp(— 2b/a) — 2b], 
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3.28 


The principal value of the logarithmic function of a complex variable is defined 
to have its argument in the range —n < arg z < n. By writing z = tanw in terms 
of exponentials show that 


. -i 1 (l + iz 
tan z = — In 


Use this result to evaluate 


tan 


2 i V 1 — iz 


l (2j3- 3 / 


3.9 Hints and answers 

3.1 (a) 5 + 3i; (b) — 1 — 5i; (c) 10 + 5/; (d) 2/5+11*75; (e) 4; (f) 3-4i; 

(g) 5 + *'[tan 1 (4/3) + 2nn] ; (h) +(2.521 + 0.595/). 

3.3 2 + V3. 

3.4 (a) Set t = tanO with — zr/2 < 6 < n/2. The equation becomes z — c = pe 2 ' 0 . 

The locus is a circle, centre c, radius p. 

(b) Eliminate the f term between x and y. Note that the coefficient of t is 
proportional to Im(fc/c). The locus is a straight line (Imk)[x — Re(a)] = 
(Refc)[y — Im(fl)], where k = b or c. 

3.5 (a) exp(— 2y)cos2x; (b) (sin2y sinh2x)/2; (c) yf2exp(ni/3) or % /2exp(47i//3); 

(d) exp( 1/^/2) or exp(— 1/^/2); (e) 0.540 — 0.841/; (f) 8 sin(ln 2) = 5.11; 

(g) exp(— re/2 — 27 in); (h) In 8 + i(2n + l/2)n. 

3.6 (a) y = (+y/2 — l)x; (b) x = +y; (c) the half of the circle (x — \) 2 + y 2 = \ that 
lies in y < 0. 

3.7 Starting from |x + iy — ia\ = X\x + iy + ia\, show that the coefficients of x and y 
are equal, and write the equation in the form x 2 + (y — a) 2 = r 2 . 

3.8 (a) arg [(b - a)/(c - a)] = arg [(B - A)/{C - A)]. 

(b) \(b — a)\/\(c — a)\ = \{B — A)\/\(C — A)\. 

3.9 (a) Circles enclosing z = —ia, with X = expc > 1. 

(b) The condition is that arg[(z — ia)/(z + ia)] = k. This can be rearranged to 
give a(z + z*) = k(a 2 — |z| 2 ), which becomes in x,y coordinates the equation 
of a circle with centre (—a/k,0) and radius a(l +fc~ 2 ) 1/2 . 

3.10 (a) Z = (— dz + b)/{cz — a). 

(b) | (Z — Zi)/(Z — Zi)\ = A, with Zip given by setting z = zip in the result in 
(a); |a — czi\ = X\ a — CZ 2 I. 

3.11 All three conditions are satisfied in 2n/2 < 8 < ln/A, |z| < 4; area = 2n. 

3.12 (a) Express w" — 1 as a product of factors like (<m — m r n ) and examine the 

coefficients of (i) co" -1 and (ii) co°. 

(b) (x + co 3 y + co 2 z)(x + co 2 y + co 3 z). 

3.13 Denoting exp[27r//(2/>i + 1)] by £1, express x 2m+1 — a 2m+1 as a product of factors 
like (x — aQ r ) and then combine those containing Q r and Q 2m+1 ~ r . Use the fact 
that fi 2m+1 = 1. 

3.14 (b) Differentiate (zi — z 2 )(z\ —z 2 ). (c) Write 2|zi| 2 + 2|z 2 | 2 as | zi +z 2 1 2 — \zi — z 2 1 2 . 
Circular motion about a fixed point with the vortices at the opposite ends of a 
diameter. 

3.15 The roots are 2 1/3 exp(27i«//3) for n = 0, 1,2; 1 + 3 1/4 ; 1 + 3 1/4 i. 

3.16 (a) The vanishing of the real and imaginary parts of f(Xi) requires (X 2 = 3 or |) 

and (X 2 = 0 or 3 or 12); hence X 2 = 3 and f(z) = (z 2 + 3)(z 3 — 6z 2 + 12z — 16). 
(b) a = —2, b = —8. The roots are +i>/3, 4, 1 + iy/ 3. 
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3.17 (b) S 2 (n) = l"' 1 sin(/ra/4). S 2 ( 6) = -8, S 2 ( 7) = -8, S 2 ( 8) = 0. 

3.18 Write 1 + cost? and sin0 in terms of 9/2. 

3.20 (cos40)/8 — (cos20)/2 + 3/8. 

3.21 Show that cos 5 9 = 16 c 5 — 20c 3 + 5c, where c = cos 9, and correspondingly for 

sin 59. Use cos -2 0=1+ tan 2 9. The four required values are 

[(5 - V20V5] 1 / 2 , (5 - V20)*/ 2 , [(5 + V20)/5] 1/2 , (5 + V20) 1 / 2 . 

3.23 Reality of the root(s) requires c 2 + b 2 > a 2 and a + b > 0. With these conditions, 
there are two roots if a 2 > b 2 , but only one if b 2 > a 2 . 

For a 2 = c 2 + b 2 , x = \ ln[(a — b)/(a + bj]. 

3.24 (a) ln(l/V3); (b) (1 + y/S)/2; (c) +(12) 1/4 /(73 + 1). 

3.25 Reduce the equation to 16sinh 4 x = 1, yielding x = +0.481. 

3.26 The same expressions but with 4> replaced by —cj) are obtained. 

3.27 Show that ds = (coshx/n) dx; 

curved surface area = na 2 [S sinh(fe/a) — sinh(21>/a)] — 2n ah. 

3.28 7i /6 — i In ^/2. 
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Series and limits 


4.1 Series 

Many examples exist in the physical sciences of situations where we are presented 
with a sum of terms to evaluate. For example, we may wish to add the contributions 
from successive slits in a diffraction grating to find the total light intensity at a 
particular point behind the grating. 

A series may have either a finite or infinite number of terms. In either case, the 
sum of the first N terms of a series (often called a partial sum) is written 


S N — Ml + M 2 + M 3 + • • • + Mjv, 


where the terms of the series w„, n = 1,2,3 , ...,1V are numbers, that may in 
general be complex. If the terms are complex then Sn will in general be complex 
also, and we can write Sn = Xn + /'T.v, where Xn and Yjv are the partial sums of 
the real and imaginary parts of each term separately and are therefore real. If a 
series has only N terms then the partial sum S,v is of course the sum of the series. 
Sometimes we may encounter series where each term depends on some variable, 
x, say. In this case the partial sum of the series will depend on the value assumed 
by x. For example, consider the infinite series 

S(x) = 1 + x + fy + |y H — • 

This is an example of a power series; these are discussed in more detail in 
section 4.5. It is in fact the Maclaurin expansion of expx (see subsection 4.6.3). 
Therefore S(x) = expx and, of course, varies according to the value of the 
variable x. A series might just as easily depend on a complex variable z. 

A general, random sequence of numbers can be described as a series and a sum 
of the terms found. However, for cases of practical interest, there will usually be 
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some sort of relationship between successive terms. For example, if the nth term 
of a series is given by 

1 

U ii — — i 
2" 

for n = 1, 2, 3, . . . , N then the sum of the first N terms will be 

^ ' 111 1 

Sjv = = - + - + - + ■" + -y. (4.1) 


It is clear that the sum of a finite number of terms is always finite, provided 
that each term is itself finite. It is often of practical interest, however, to consider 
the sum of a series with an infinite number of finite terms. The sum of an 
infinite number of terms is best defined by first considering the partial sum 
of the first N terms, Sn- If the value of the partial sum Sn tends to a finite 
limit, S, as N tends to infinity, then the series is said to converge and its sum 
is given by the limit S. In other words, the sum of an infinite series is given 
by 


S = lim Sn, 

N — >00 

provided the limit exists. For complex infinite series, if Sn approaches a limit 
S = X + iY as N — ► oo, this means that Xn — > X and Yn — > Y separately, i.e. 
the real and imaginary parts of the series are each convergent series with sums 
X and Y respectively. 

However, not all infinite series have finite sums. As N —* oo, the value of the 
partial sum Sn may diverge: it may approach +oo or — oo, or oscillate finitely 
or infinitely. Moreover, for a series where each term depends on some variable, 
its convergence can depend on the value assumed by the variable. Whether an 
infinite series converges, diverges or oscillates has important implications when 
describing physical systems. Methods for determining whether a series converges 
are discussed in section 4.3. 


4.2 Summation of series 

It is often necessary to find the sum of a finite series or a convergent infinite 
series. We now describe arithmetic, geometric and arithmetico-geometric series, 
which are particularly common and for which the sums are easily found. Other 
methods that can sometimes be used to sum more complicated series are discussed 
below. 
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4.2.1 Arithmetic series 

An arithmetic series has the characteristic that the difference between successive 
terms is constant. The sum of a general arithmetic series is written 

N - 1 

Sn = a T {a T d) T (a T 2d) -)-••• T [a T (N — 1 )r/] = ^ ' (u -t- nd). 

n = 0 

Rewriting the series in the opposite order and adding this term by term to the 
original expression for Sn, we find 

N N 

Sn = -y [a + a T {N — l)<f] = —(first term + last term). (4.2) 

If an infinite number of such terms are added the series will increase (or decrease) 
indefinitely; that is to say, it diverges. 


►Sam the integers between 1 and 1000 inclusive. 


This is an arithmetic series with a = 1, d = 1 and N = 1000. Therefore, using (4.2) we find 

1000 


Sn — 


-(1 + 1000) = 500500, 


which can be checked directly only with considerable effort. ◄ 


4.2.2 Geometric series 

Equation (4.1) is a particular example of a geometric series, which has the 
characteristic that the ratio of successive terms is a constant (one-half in this 
case). The sum of a geometric series is in general written 

JV-l 

Sn = « + ar + ar 2 + ■ ■ ■ + ar N ~ l = ar n , 

n = 0 

where a is a constant and r is the ratio of successive terms, the common ratio. The 
sum may be evaluated by considering Sn and rSjv : 

Sn — a + ot + ar 2 + ar~' + ■ ■ ■ + ar N ~ l , 
rSN = ar + ar 2 + ar 3 + ar 4 + • • • + ar N . 

If we now subtract the second equation from the first we obtain 

(1 — t)Sn = a — ar N , 


and hence 


Sn = 


a(l -r N ) 
1 -r 


(4.3) 
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For a series with an infinite number of terms and |r| < 1, we have limjv-^oo r N = 0, 
and the sum tends to the limit 

a 

S = -. • (4.4) 

1 — r 

In (4.1), r = a = and so S = 1. For |r| > 1, however, the series either diverges 
or oscillates. 


► Consider a ball that drops from a height of 21 m and on each bounce retains only a third 
of its kinetic energy; thus after one bounce it will return to a height of 9 m, after two 
bounces to 3 m, and so on. Find the total distance travelled between the first bounce and 
the Mth bounce. 


The total distance travelled between the first bounce and the Mth bounce is given by the 
sum of M — 1 terms: 

Mz? 9 

Sm-i = 2(9 + 3 + 1h ) = 2 ^ — 

m=0 

for M > 1, where the factor of 2 is included to allow for both the upward and the 
downward journey. Inside the parentheses we clearly have a geometric series with first 
term 9 and common ratio 1/3 and hence the distance is given by (4.3), i.e. 



where the number of terms N in (4.3) has been replaced by M — 1. ◄ 


4.2.3 Arithmetico-geometric series 


An arithmetico-geometric series, as its name suggests, is a combined arithmetic 
and geometric series. It has the general form 

N—l 

Sjv = a + (a + d)r + (a + 2 d)r 2 + ■■■ + [a + (N — l)c/] r^ -1 = y ^(a + nd)r'\ 

n = 0 

and can be summed, in a similar way to a pure geometric series, by multiplying 
by r and subtracting the result from the original series to obtain 

( 1 — r)Sjv = a + rd + r 2 d + • • • + r N ~ l d — [a + {N — l)t/] r N . 


Using the expression for the sum of a geometric series (4.3) and rearranging, we 
find 


Sn = 


a — [a + (N — 1 )d] r' 
1 — r 


rd(l-r N - 1 ) 
(1 - r ) 2 


For an infinite series with |r| < 1, limAr^-oo r N = 0 as in the previous subsection, 
and the sum tends to the limit 


S = 


rd 


1 — r (1 — r ) 2 

As for a geometric series, if \r\ > 1 then the series either diverges or oscillates. 


(4.5) 
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This is an infinite arithmetico-geometric series with a = 2, d = 3 and r = 1/2. Therefore, 
from (4.5), we obtain S = 10. ◄ 


4.2.4 The difference method 

The difference method is sometimes useful in summing series that are more 
complicated than the examples discussed above. Let us consider the general series 

N 

'y ] u„ = U\ + U2 + • • • + Mjv- 
n=l 

If the terms of the series, u n , can be expressed in the form 

u,i =f(n)-f(n-l) 

for some function f(n ) then its (partial) sum is given by 

N 

s N = ^u n =m)-m. 

n= 1 

This can be shown as follows. The sum is given by 

Sjv = T U2 T * * * T Uj v 

and since u„ = f(n) — f(n — 1), it may be rewritten 

Sn = [/(l) - /(0)] + [/( 2) — /(l)] H h [/(AO - f(N - 1)]. 

By cancelling terms we see that 

S N =f(N)-f( 0). 



Using partial fractions we find 



Hence u n = f(n ) — f(n — 1) with f{n ) = — l/(n + 1), and so the sum is given by 

s, = /w-/( 0 )— i 4 T + ‘-«f T < 
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The difference method may be easily extended to evaluate sums in which each 
term can be expressed in the form 

u « — f ( n ) ~ f(n ~ m), (4.6) 

where m is an integer. By writing out the sum to N terms with each term expressed 
in this form, and cancelling terms in pairs as before, we find 

m m 

s N = ^2m- k +i)-^2ni- k ). 

k = 1 k = 1 


► Evaluate the sum 


E 


i 

n(n + 2) 


Using partial fractions we find 

r i i ' 

" 2(n + 2) 2 n 

Hence u„ = f(n) — f(n — 2) with f(n) = — 1/[2 (n + 2)], and so the sum is given by 


S N = f(N)+f(N - 1) -/(0) — /(— 1) =\~\ 


1 


+ 


1 


IV + 2 N + 1 


In fact the difference method is quite flexible and may be used to evaluate 
sums even when each term cannot be expressed as in (4.6). The method still relies, 
however, on being able to write u„ in terms of a single function such that most 
terms in the sum cancel, leaving only a few terms at the beginning and the end. 
This is best illustrated by an example. 


► Evaluate the sum 


E 


i 

n(n + l)(n + 2)' 


Using partial fractions we find 

1 _ i 1 

" 2 (n + 2) n + 1 2 n 

Hence u„ = f(n) — 2 f(n — 1) + f(n — 2) with f(n) = l/[2(n + 2)]. If we write out the sum, 
expressing each term u„ in this form, we find that most terms cancel and the sum is given 
by 
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4.2.5 Series involving natural numbers 

Series consisting of the natural numbers 1, 2, 3, . . . , or the square or cube of these 
numbers, occur frequently and deserve a special mention. Let us first consider 
the sum of the first N natural numbers, 

N 

Sn = l+ 2 + 3 + -- -+ IV = ^ ' n. 

71=1 

This is clearly an arithmetic series with first term a — 1 and common difference 
d = 1. Therefore, from (4.2), Sn = \N(N +1). 

Next, we consider the sum of the squares of the first N natural numbers: 

S N = l 2 + 2 2 + 3 2 + . . . + N 2 = ^ n 2 , 

n = 1 

which may be evaluated using the difference method. The nth term in the series 
is u n = n 2 , which we need to express in the form f(n) — f(n— 1) for some function 
f{n). Consider the function 

f(n) = n(n + l)(2n + 1) => f(n — 1) = (n — l)n(2n — 1). 

For this function f(n ) — f(n — 1) = 6 n 2 , and so we can write 

u n = 5 [f(n)-f(n- 1)]. 

Therefore, by the difference method, 

Sn = | U(N) - /(0)] = i N(N + 1)(2 N + 1). 

Finally, we calculate the sum of the cubes of the first N natural numbers, 

S N = l 3 + 2 3 + 3 3 + • • • + N 3 = J2 

71=1 

again using the difference method. Consider the function 

f(n) = [n(n + l)] 2 f(n - 1) = [(n - l)n] 2 , 

for which f(n) — f(n — 1) = 4n 3 . Therefore we can write the general nth term of 
the series as 

u n = \U(n) -f(n- 1)], 
and using the difference method we find 

S N = \[f(N)-m\ = \N 2 (N + l) 2 . 

Note that this is the square of the sum of the natural numbers, i.e. 

71=1 \ 71=1 / 
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The fith term in this series is 

u n = (n + 1 )(n + 3) = f r + 4 n + 3, 



= \N(N+ l)(2N + l) + 4 x ±N(N+l) + 3N 
= iiV(2A 2 + 15iV + 31). ◄ 


4.2.6 Transformation of series 

A complicated series may sometimes be summed by transforming it into a 
familiar series for which we already know the sum, perhaps a geometric series 
or the Maclaurin expansion of a simple function (see subsection 4.6.3). Various 
techniques are useful, and deciding which one to use in any given case is a matter 
of experience. We now discuss a few of the more common methods. 

The differentiation or integration of a series is often useful in transforming an 
apparently intractable series into a more familiar one. If we wish to differentiate 
or integrate a series that already depends on some variable then we may do so 
in a straightforward manner. 



Dividing both sides by x we obtain 

S(x) x 3 | x 4 | x 5 
~T“ = 3(0!) + 4(TT) + 5(2!) + " ' ’ 
which is easily differentiated to give 

d rs(x)l x 2 x 3 x 4 x 5 
dx [ x \ 0! 1! 2! 3! 

Recalling the Maclaurin expansion of expx given in subsection 4.6.3, we recognise that 
the RHS is equal to x 2 exp x. Having done so, we can now integrate both sides to obtain 

S(x)/x= f x 2 exp x dx. 


125 





SERIES AND LIMITS 


Integrating the RHS by parts we find 

S(x)/x = x 2 exp x — 2x exp x + 2 exp x + c, 

where the value of the constant of integration c can be fixed by the requirement that 
S(x)/x = 0 at x = 0. Thus we find that c = —2, and that the sum is given by 

S(x) = x 3 exp x — 2x 2 exp x + 2x exp x — 2x. ◄ 

Often, however, we require the sum of a series that does not depend on a 
variable. In this case, in order that we may differentiate or integrate the series, 
we define a function of some variable x such that the value of this function is 
equal to the sum of the series for some particular value of x (usually at x =1). 


►Sum the series 

0 , 2 3 4 


S-i + 2 + 22 + 23 +"• • 


Let us begin by defining the function 

f(x) = 1 + 2x + 3x 2 + 4x 3 + • • • , 
so that the sum S = /(1/2). Integrating this function we obtain 

I f(x) dx = x + x 2 + x 3 H , 

which we recognise as an infinite geometric series with first term a = x and common ratio 
r = x. Therefore, from (4.4), we find that the sum of this series is x/(l — x). In other words 

/ fMdx= i^, 

so that f(x) is given by 

•^ (X) = lx (r~x) = (T^x) 3 ' 

The sum of the original series is therefore S = /( 1 /2) = 4. ◄ 

Aside from differentiation and integration, an appropriate substitution can 
sometimes transform a series into a more familiar form. In particular, series with 
terms that contain trigonometric functions can often be summed by the use of 
complex exponentials. 


►Sum the series 

„ . . cos 20 cos 30 


5(0) = 1 + cos 0 4 — 1 — 1 . 


Replacing the cosine terms with a complex exponential, we obtain 

f, exp2i0 exp3;0 I 

S(9) = Re < 1 + exp id 4 — 1 — !-•••> 

f (exp id) 2 (exp id ) 3 I 

= Re < 1 + exp id + !,, + 3, +••• f- 


126 





4.3 CONVERGENCE OF INFINITE SERIES 


Again using the Maclaurin expansion of exp x given in subsection 4.6.3, we notice that 
S(0) = Re [exp(exp/0)] = Re [exp(cos0 + / sin <9 )] 

= Re | : {[exp(cos$)][exp(/sinfi)]} = [exp(cos 0)]Re [exp( i sin 6 )] 

= [exp(cos0)][cos(sinf))]. ◄ 


4.3 Convergence of infinite series 

Although the sums of some commonly occurring infinite series may be found, 
the sum of a general infinite series is usually difficult to calculate. Nevertheless, 
it is often useful to know whether the partial sum of such a series converges to 
a limit, even if the limit cannot be found explicitly. As mentioned at the end of 
section 4.1, if we allow N to tend to infinity, the partial sum 

N 

Sn ^ ' T; 

n = 1 

of a series may tend to a definite limit (i.e. the sum S of the series), or increase 
or decrease without limit, or oscillate finitely or infinitely. 

To investigate the convergence of any given series, it is useful to have available 
a number of tests and theorems of general applicability. We discuss them below; 
some we will merely state, since once they have been stated they become almost 
self-evident, but are no less useful for that. 


4.3.1 Absolute and conditional convergence 

Let us first consider some general points concerning the convergence, or otherwise, 
of an infinite series. In general an infinite series u n can have complex terms, 
and in the special case of a real series the terms can be positive or negative. From 
any such series, however, we can always construct another series \u n \ in which 
each term is simply the modulus of the corresponding term in the original series. 
Then each term in the new series will be a positive real number. 

If the series ^ |i/„| converges then M « also converges, and u n is said to be 
absolutely convergent, i.e. the series formed by the absolute values is convergent. 
For an absolutely convergent series, the terms may be reordered without affecting 
the convergence of the series. However, if |m„| diverges whilst converges 
then u n is said to be conditionally convergent. For a conditionally convergent 
series, rearranging the order of the terms can affect the behaviour of the sum 
and, hence, whether the series converges or diverges. In fact, a theorem due 
to Riemann shows that, by a suitable rearrangement, a conditionally convergent 
series may be made to converge to any arbitrary limit, or to diverge, or to oscillate 
finitely or infinitely ! Of course, if the original series u n consists only of positive 
real terms and converges then automatically it is absolutely convergent. 
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4.3.2 Convergence of a series containing only real positive terms 

As discussed above, in order to test for the absolute convergence of a series 
u m we first construct the corresponding series \ u„\ that consists only of real 
positive terms. Therefore in this subsection we will restrict our attention to series 
of this type. 

We discuss below some tests that may be used to investigate the convergence of 
such a series. Before doing so, however, we note the following crucial consideration. 
In all the tests for, or discussions of, the convergence of a series, it is not what 
happens in the first ten, or the first thousand, or the first million terms (or any 
other finite number of terms) that matters, but what happens ultimately. 

Preliminary test 

A necessary but not sufficient condition for a series of real positive terms u n 
to be convergent is that the term u n tends to zero as n tends to infinity, i.e. we 
require 

lim u n = 0. 

n— >co 

If this condition is not satisfied then the series must diverge. Even if it is satisfied, 
however, the series may still diverge, and further testing is required. 

Comparison test 

The comparison test is the most basic test for convergence. Let us consider two 
series u n and v n and suppose that we know the latter to be convergent (by 
some earlier analysis, for example). Then, if each term u n in the first series is less 
than or equal to the corresponding term v n in the second series, for all n greater 
than some fixed number N which will vary from series to series, then the original 
series u n is also convergent. In other words, if v n is convergent and 

u n < v n for n > N , 


then u n converges. 

However, if v n diverges and u„ > v n for all n greater than some fixed number 
then ^2 u n diverges. 


► Determine whether the following series converges: 


E 


i 

n ! + 1 


1111 
2 + 3 + 7 + 25 + ' 


( 4 . 7 ) 


Let us compare this series with the series 


00 


E 


i 

n\ 


1 1 1 

0! + IT + 2! 


1 .11 

_+"._2+- + - + . 


( 4 - 8 ) 
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which is merely the series obtained by setting x = 1 in the Maclaurin expansion of expx 
(see subsection 4.6.3), i.e. 

ex p(l) = e =l + 1 1 ! + i ! +i ! +.... 

Clearly this second series is convergent, since it consists of only positive terms and has a 
finite sum. Thus, since each term u„ in the series (4.7) is less than the corresponding term 
1/n! in (4.B), we conclude from the comparison test that (4.7) is also convergent. ◄ 

D'Alembert’s ratio test 

The ratio test determines whether a series converges by comparing the relative 
magnitude of successive terms. If we consider a series and set 

p = lim ") , (4.9) 

V u n / 

then if p < 1 the series is convergent; if p > 1 the series is divergent; if p = 1 
then the behaviour of the series is undetermined by this test. 

To prove this we observe that if the limit (4.9) is less than unity, i.e. p < 1 then 
we can find a value r in the range p < r < 1 and a value N such that 

Un+1 

< r, 

u tt 

for all n > N. Now the terms u„ of the series that follow njv are 

MJV+I) MjV+2> U N+ 3, 

and each of these is less than the corresponding term of 

rux, r 2 UN, i~ 2 un, . ... (4-10) 

However, the terms of (4.10) are those of a geometric series with a common 
ratio r that is less than unity. This geometric series consequently converges and 
therefore, by the comparison test discussed above, so must the original series 
11 n- An analogous argument may be used to prove the divergent case when 

p>\. 


► Determine whether the following series converges : 


00 


E 


i 

n ! 


1111 11 
0! + TT + 2! + 3! + "'“ 2+ 2! + 3! 


+ ••• . 


As mentioned in the previous example, this series may be obtained by setting x = 1 in the 
Maclaurin expansion of exp x, and hence we know already that it converges and has the 
sum exp(l) = e. Nevertheless, we may use the ratio test to confirm that it converges. 
Using (4.9), we have 


p = lim 

n—*co 


n\ 

(n + 1)! 



= 0 


and since p < 1, the series converges, as expected. ◄ 


(4.11) 
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Ratio comparison test 

As its name suggests, the ratio comparison test is a combination of the ratio and 
comparison tests. Let us consider the two series Y u n and Y v n and assume that 
we know the latter to be convergent. It may be shown that if 

Wfi+l 

Un Vn 

for all n greater than some fixed value N then Y u n is a lso convergent. 

Similarly, if 

Un+l ^ ^n+1 
U n V n 

for all sufficiently large n, and Y v n diverges then Y u n also diverges. 


► Determine whether the following series converges: 


sY 1 j_ 

^ (n\) 2 + ?2 + 


2 2 



In this case the ratio of successive terms, as n tends to infinity, is given by 

1 

n + 1 

which is less than the ratio seen in (4.11). Hence, by the ratio comparison test, the series 
converges. (It is clear that this series could also be found to be convergent using the ratio 
test.) ◄ 



R = lim 

H— XX) 


(n + 1)! 


= lim 


Quotient test 

The quotient test may also be considered as a combination of the ratio and 
comparison tests. Let us again consider the two series Y u n and v n > and define 
p as the limit 


p = lim 

n — >oo 



(4.12) 


Then, it can be shown that: 

(i) if p ^ 0 but is finite then Y M « and Y v n either both converge or both 
diverge; 

(ii) if p = 0 and Y v n converges then Y u n converges; 

(in) if p = oo and Y v n diverges then Y u n diverges. 
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► Giyen that the series Y.'/-t 1 /n diverges, determine whether the following series converges: 


E 


4 n 1 — n — 3 
;j 3 + 2n 


(4.13) 


If we set u„ = (4jj 2 — n — 3 )/(n 3 + 2n) and v„ = 1/n then the limit (4.12) becomes 


p = lim 


(4 n 2 — n — 3 )/(n 3 + 2 n) 


1/n 


= lim 

n—* oo 


4 n 3 — n 2 — 3 n 
n } + 2 n 


= 4. 


Since p is finite but non-zero and Y v n diverges, from (i) above Y u n must also diverge. ◄ 


Integral test 

The integral test is an extremely powerful means of investigating the convergence 
of a series Y u n- Suppose that there exists a function f(x) which monotonically 
decreases for x greater than some fixed value xo and for which f(n) = u n , i.e. the 
value of the function at integer values of x is equal to the corresponding term 
in the series under investigation. Then it can be shown that, if the limit of the 
integral 

pi v 

lim / f(x)dx 

N — XX) J 

exists, the series Y 11 » ' s convergent. Otherwise the series diverges. Note that the 
integral defined here has no lower limit; the test is sometimes stated with lower 
limit of unity for the integral, but this can lead to unnecessary difficulties. 

► Determine whether the following series converges: 

E(„_ 3 /2) 2 - 4 + 4+ 9 + 25 + "' • 


Let us consider the function /(x) = (x — 3/2) 2 . Clearly f(n ) = u n and f(x) monotonically 
decreases for x > 3/2. Applying the integral test, we consider 

f N 1 / -1 \ 

n—*co J (x — 3/2) 2 n->oo y N — 3/2 y 

Since the limit exists the series converges. Note, however, that if we had included a lower 
limit of unity in the integral then we would have run into problems, since the integrand 
diverges at x = 3/2. ◄ 

The integral test is also useful for examining the convergence of the Riemann 
zeta series. This is a special series that occurs regularly and is of the form 


It converges for p > 1 and diverges if p < 1. These convergence criteria may be 
derived as follows. 
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Using the integral test, we consider 

inn [ N = lim ( £l) , 

N->oo J X p AT-ko \ l — p J 

and it is obvious that the limit tends to zero for p > 1 and to oo for p < 1. 

Cauchy’s root test 

Cauchy’s root test may be useful in testing for convergence, especially if the nth 
terms of the series contains an nth power. If we define the limit 

p = lim (u„) 1/n , 

n — >oo 

then it may be proved that the series u n converges if p < 1. If p > 1 then the 
series diverges. Its behaviour is undetermined if p = 1. 


► Determine whether the following series converges: 





+ ••• . 


Using Cauchy’s root test, we find 



and hence the series converges. ◄ 


Grouping terms 

We now consider the Riemann zeta series, mentioned above, with an alternative 
proof of its convergence that uses the method of grouping terms. In general there 
are better ways of determining convergence, but the grouping method may be 
used if it is not immediately obvious how to approach a problem by a better 
method. 

First consider the case where p > 1 and group the terms in the series as follows: 



Now we can see that each bracket of this series is less than each term of the 
geometric series 


„ 1 2 
Sn — y - "b — 
Ip 2 p 



This geometric series has common ratio r = Q) p 1 ; therefore r < 1 since p > 1, 
and so the geometric series converges. Then the comparison test shows that the 
Riemann zeta series also converges for p > 1. 
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The divergence of the Riemann zeta series for p < 1 can be seen by first 
considering the case p = 1. The series is 

„ , 1 1 1 

S N — 1 + - + - + - + -- , 

which does not converge, as may be seen by bracketing the terms of the series in 
groups in the following way: 



The sum of the terms in each bracket is > \ and, since as many such groupings 
can be made as we wish, it is clear that Sn increases indefinitely as N is increased. 

Now returning to the case of the Riemann zeta series for p < 1, we note that 
each term in the series is greater than the corresponding one in the series for 
which p — 1. In other words 1 /n p > 1/n for n > 1, p < 1. The comparison test 
then shows us that the Riemann zeta series will diverge for all p < 1. 


4.3.3 Alternating series test 

The tests discussed in the last subsection have been concerned with determining 
whether the series of real positive terms \ u »\ converges, and so whether J2 u n 
is absolutely convergent. Nevertheless, it is sometimes useful to consider whether 
a series is merely convergent rather than absolutely convergent. This is especially 
true for series containing an infinite number of both positive and negative terms. 
In particular, we will consider the convergence of series in which the positive and 
negative terms alternate, i.e. an alternating series. 

An alternating series can be written as 
00 

1)" +1 M„ = Mi — ll2 + «3 — «4 + «5 • , 

n= 1 

with all u n > 0. Such a series can be shown to converge provided (i) u n — > 0 as 
n — > oo and (ii) u„ < m„_i for all n > N for some finite N. If these conditions are 
not met then the series oscillates. 

To prove this, suppose for definiteness that N is odd and consider the series 
starting at mjv- The sum of its first 2m terms is 

Sim = i u N — u N+l) + l u N+ 2 ~ u N+l) H + ( u N+2m-2 ~ u N+2m~\ )• 

By condition (ii) above, all the parentheses are positive, and so Si m increases as 
m increases. We can also write, however, 

Sim = UN — («iV+l ~ u N+l) — ■ ■ ■ — [UN+2m-3 — «JV+2m-2) ~ Un+ 2m-U 

and since each parenthesis is positive, we must have S 2 m < u N . Thus, since Si m 
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is always less than u,v for all m and u„ — » 0 as n — > oo, the alternating series 
converges. It is clear that an analogous proof can be constructed in the case 
where N is even. 


► Determine whether the following series converges: 



n = l 


This alternating series clearly satisfies conditions (i) and (ii) above and hence converges. 
However, as shown above by the method of grouping terms, the corresponding series with 
all positive terms is divergent. ◄ 


4.4 Operations with series 

Simple operations with series are fairly intuitive, and we discuss them here only 
for completeness. The following points apply to both finite and infinite series 
unless otherwise stated. 

(i) If 22 u n — S then J2 ku„ — kS where k is any constant. 

(ii) If u n = S and J2 v n = T then J2( u n + v n ) = S + T. 

(iii) If 22 11 n = S then a + 22 u n = a + S. A simple extension of this trivial result 
shows that the removal or insertion of a finite number of terms anywhere 
in a series does not affect its convergence. 

(iv) If the infinite series 22 u„ and v„ are both absolutely convergent then 
the series J2 w n> where 

w n = uiv„ + u 2 v n - 1 H b u„vu 

is also absolutely convergent. The series ^2 w n is called the Cauchy product 
of the two original series. Furthermore, if M « converges to the sum S 
and v „ converges to the sum T then 22 w » converges to the sum ST. 

(v) It is not true in general that term-by-term differentiation or integration of 
a series will result in a new series with the same convergence properties. 


4.5 Power series 


A power series has the form 

P(x) = «o + aix + a 2 x 2 + a 2 x 3 + ■ • • , 

where ao, a 2 , a 2 , a 2 etc. are constants. Such series regularly occur in physics and 
engineering and are useful because, for |x| < 1, the later terms in the series may 
become very small and be discarded. For example the series 

T (x) = 1 + x + x2 T x 3 T • • • , 
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although in principle infinitely long, in practice may be simplified if x happens to 
have a value small compared with unity. To see this note that P(x ) for x = 0.1 
has the following values: 1, if just one term is taken into account; 1.1, for two 
terms; 1.11, for three terms; 1.111, for four terms, etc. If the quantity that it 
represents can only be measured with an accuracy of two decimal places, then all 
but the first three terms may be ignored, i.e. when x = 0.1 or less 

P(x) = 1 + x + x 2 + 0(x 3 ) « 1 + x + x 2 . 

This sort of approximation is often used to simplify equations into manageable 
forms. It may seem imprecise at first but is perfectly acceptable insofar as it 
matches the experimental accuracy that can be achieved. 

The symbols O and « used above need some further explanation. They are 
used to compare the behaviour of two functions when a variable upon which 
both functions depend tends to a particular limit, usually zero or infinity (and 
obvious from the context). For two functions f(x) and g(x), with g positive, the 
formal definitions of the above symbols are as follows; 

(i) If there exists a constant k such that |/| < kg as the limit is approached 
then / = O(g). 

(ii) If as the limit of x is approached / /g tends to a limit I, where / f 0, then 
/ « Ig. The statement f ~ g means that the ratio of the two sides tends 
to unity. 


4.5.1 Convergence of power series 

The convergence or otherwise of power series is a crucial consideration in practical 
terms. For example, if we are to use a power series as an approximation, it is 
clearly important that it tends to the precise answer as more and more terms of 
the approximation are taken. Consider the general power series 

P(x) = «o + a i x + aix 2 + • • • . 

Using d’Alembert’s ratio test (see subsection 4.3.2), we see that P(x) converges 
absolutely if 


lim 

dn + 1 

v 

= |x| lim 

dn+ 1 

A 


n—> co 

d n 

n— kx) 

tt n 


Thus the convergence of P(x) depends upon the value of x, i.e. there is, in general, 
a range of values of x for which P(x) converges, an interval of convergence. Note 
that at the limits of this range p = 1, and so the series may converge or diverge. 
The convergence of the series at the end-points may be determined by substituting 
these values of x into the power series P(x) and testing the resulting series using 
any applicable method (discussed in section 4.3). 
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► Determine the range of values of x for which the following power series converges: 
P{x) = 1 + 2x + 4.x 2 + 8.x 3 + ■ • • . 


By using the interval-of-convergence method discussed above, 

p = lint 

n—*co 

and hence the power series will converge for |x| < 1/2. Examining the end-points of the 
interval separately, we find 

P(l/2) = 1 + 1 + 1 + 

P(— 1 / 2 ) = 1-1 + 1 . 

Obviously P(l/2) diverges, while P(— 1/2) oscillates. Therefore P(.x) is not convergent at 
either end-point of the region but is convergent for — 1 < x < 1. ◄ 

The convergence of power series may be extended to the case where the 
parameter z is complex. For the power series 



P(z) = «o + fltz + «2Z 2 + ■ ■ • , 


we find that P(z) converges if 


lira 

^n+1 
Z 

= 1 z lim 

^n + 1 

n— kx) 

tt n 

n — xx) 

tt n 


We therefore have a range in |z| for which P(z) converges, i.e. P(z) converges 
for values of z lying within a circle in the Argand diagram (in this case centred 
on the origin of the Argand diagram). The radius of the circle is called the 
radius of convergence: if z lies inside the circle, the series will converge whereas 
if z lies outside the circle, the series will diverge; if, though, z lies on the circle 
then the convergence must be tested using another method. Clearly the radius of 
convergence R is given by 1/P = lim„^ r/ j |a„ + i/a„|. 


► Determine the range of values of z for which the following complex power series converges: 


P(z) = l-y + T - ¥+ - 


We find that p = |z/2|, which shows that P(z) converges for |z| <2. Therefore the circle 
of convergence in the Argand diagram is centred on the origin and has a radius R = 2. 
On this circle we must test the convergence by substituting the value of z into P(z) and 
considering the resulting series. On the circle of convergence we can write z = 2exp/0. 
Substituting this into P(z), we obtain 

iu i 2 exp id 4exp2i'0 
P(z) = 2 + 4 

= 1 — exp id + [exp id] 2 — ■ ■ ■ , 

which is a complex infinite geometric series with first term a = 1 and common ratio 
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r 


— exp id. Therefore, on the the circle of convergence we have 


P(z) 


1 

1 + exp id ' 


Unless d = n this is a finite complex number, and so P(z) converges at all points on the 
circle |z| = 2 except at 9 = n (i.e. z = —2), where it diverges. Note that P(z) is just the 
binomial expansion of (1 +z/ 2) _1 , for which it is obvious that z = — 2 is a singular point. 
In general, for power series expansions of complex functions about a given point in the 
complex plane, the circle of convergence extends as far as the nearest singular point. This 
is discussed further in chapter 20. ◄ 


Note that the centre of the circle of convergence does not necessarily lie at the 
origin. For example, applying the ratio test to the complex power series 


P(z) = 1 


z-1 (z- l) 2 , (z- l) 3 


we find that for it to converge we require |(z — 1)/2| < 1. Thus the series converges 
for z lying within a circle of radius 2 centred on the point (1,0) in the Argand 
diagram. 


4.5.2 Operations with power series 


The following rules are useful when manipulating power series; they apply to 
power series in a real or complex variable. 

(i) If two power series P(x) and Q(x) have regions of convergence that overlap 
to some extent then the series produced by taking the sum, the difference or the 
product of P(x) and Q(x) converges in the common region. 

(ii) If two power series P(x) and Q(x ) converge for all values of x then one 
series may be substituted into the other to give a third series, which also converges 
for all values of x. For example, consider the power series expansions of sin x and 
e x given below in subsection 4.6.3, 


3 5 7 

X X X 

smx = x— — + — — — + •• 

x 2 x 3 x 4 
e- =1 +*+ 2! + 3i + 4! 


both of which converge for all values of x. Substituting the series for sin x into 
that for e x we obtain 


= 1 


x 2 

' X+ 2 ! 


3x 4 

IT 


8x 5 

TT 


which also converges for all values of x. 

If, however, either of the power series P(x) and Q(x ) has only a limited region 
of convergence, or if they both do so, then further care must be taken when 
substituting one series into the other. For example, suppose Q(x) converges for 
all x, but P(x) only converges for x within a finite range. We may substitute 
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Q(x) into P{x ) to obtain P(Q(x )), but we must be careful since the value of Q{x) 
may lie outside the region of convergence for P (x), with the consequence that the 
resulting series P(Q(x)) does not converge. 

(iii) If a power series P(x) converges for a particular range of x then the series 
obtained by differentiating every term and the series obtained by integrating every 
term also converge in this range. 

This is easily seen for the power series 

P(x) = «o + aix + ci2X 2 + • • • , 


which converges if |x| < lim^oo \a„/a n+ \\ = k. The series obtained by differenti- 
ating P(x) with respect to x is given by 

dP „ 2 

— — = a i + 2ci2X + Zaix~ + ■ ■ ■ 

ax 


and converges if 


[x| < lim 

n — KXD 


na n 

(n + l)n„+i 


= k. 


Similarly the series obtained by integrating P(x ) term by term. 


P(x) dx = gqx + 


a ix 


tt2X 


converges if 


[x| < lim 

n — ►co 


(n + 2 )a n 
( n + l)a„+i 


= k. 


So, series resulting from differentiation or integration have the same interval of 
convergence as the original series. However, even if the original series converges 
at either end-point of the interval, it is not necessarily the case that the new series 
will do so. These new series must be tested separately at the end-points in order 
to determine whether they converge there. Note that although power series may 
be integrated or differentiated without altering their interval of convergence, this 
is not true for series in general. 

It is also worth noting that differentiating or integrating a power series term 
by term within its interval of convergence is equivalent to differentiating or 
integrating the function it represents. For example, consider the power series 
expansion of sin x, 


3 5 

X X 

sm x = x — — + — 



(4.14) 


which converges for all values of x. If we differentiate term by term, the series 
becomes 


x 2 x 4 x 6 

~ 2! + 4! ~ 6! 


which is the series expansion of cosx, as we expect. 
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4.6 Taylor series 

Taylor’s theorem provides a way of expressing a function as a power series in x, 
known as a Taylor series, but it can be applied only to those functions that are 
continuous and differentiable within the x-range of interest. 


4.6.1 Taylor’s theorem 

Suppose that we have a function /(x) that we wish to express as a power series 
in x — a about the point x = a. We shall assume that, in a given x-range, /(x) 
is a continuous, single-valued function of x having continuous derivatives with 
respect to x, denoted by /'(x), /"(x) and so on, up to and including / (n_1) (x). We 
shall also assume that / (n) (x) exists in this range. 

From the equation following (2.31) we may write 

ra+h 

/ /'(x) dx = f(a + h ) - /(a), 

J a 

where a, a + h are neighbouring values of x. Rearranging this equation, we may 
express the value of the function at x = a + h in terms of its value at a by 

pa+h 

(4.15) 


t *a-\-n 

f(a + h) = f (a) + / /'(x) dx. 

J a 


A first approximation for f(a + h) may be obtained by substituting f'(a) for 
/'(x) in (4.15), to obtain 

f(a + h) » / (a) + hf (a). 

This approximation is shown graphically in figure 4.1. We may write this first 
approximation in terms of x and a as 

fix) « f(a) + (x - a)f(a), 

and, in a similar way, 

fix) » f(a) + (x - a)f"(a) 
f"(x) » /"(a) + (x - a)f"(a), 

and so on. Substituting for /'(x) in (4.15), we obtain the second approximation: 

rafi-h 

f{a + h) « /(a) + / [f(a] + (x - a)f'(a)] dx 

J a 

« /(a) + hf'(a) + ^-/"(a). 


We may repeat this procedure as often as we like (so long as the derivatives 
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Figure 4.1 The first-order Taylor series approximation to a function f(x). 
The slope of the function at P, i.e. tanR equals /'(a). Thus the value of the 
function at Q, f (a + h), is approximated by the ordinate of R, f(a) + hf'(a). 


of f(x ) exist) to obtain higher-order approximations to f(a + h)\ we find the 
(n — l)th-order approximation! to be 

f(a + h) « f(a) + hf'(a) + ^ f"(a ) + ■ • • + (4.16) 

As might have been anticipated, the error associated with approximating / (a+h) 
by this (n — 1 )th-order power series is of the order of the next term in the series. 
This error or remainder can be shown to be given by 

R«(h) = 

for some c that lies in the range [a, a + h], Taylor’s theorem then states that we 
may write the equality 

f(a + h ) = f(a) + hf (a) + —j"(a) + ■ ■ ■ + - — — / ( " 1 *(«) + R n (h). 

{n (4.17) 

The theorem may also be written in a form suitable for finding f(x) given 
the value of the function and its relevant derivatives at x = a , by substituting 


f The order of the approximation is simply the highest power of h in the series. Note, though, that 
the (tt — l)th-order approximation contains n terms. 
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x = a + h in the above expression. It then reads 

f(x) = f(a) + (x - a)f'(a) + ^ f"(a) H + y^-/ ( " _1) (a) + R n (x), 

(4.18) 


where the remainder now takes the form 

n ! 

and £, lies in the range [a,x\. Each of the formulae (4.17), (4.18) gives us the 
Taylor expansion of the function about the point x = a. A special case occurs 
when a — 0. Such Taylor expansions, about x = 0, are called Maclaurin series. 

Taylor’s theorem is also valid without significant modification for functions 
of a complex variable (see chapter 20). The extension of Taylor’s theorem to 
functions of more than one variable is given in chapter 5. 

For a function to be expressible as an infinite power series we require it to be 
infinitely differentiable and the remainder term R n to tend to zero as n tends to 
infinity, i.e. lim„^ x , R n — 0. In this case the infinite power series will represent the 
function within the interval of convergence of the series. 


► Expand f(x) = sin x as a Maclaurin series, i.e. about x = 0. 


We must first verify that sin x may indeed be represented by an infinite power series. It is 
easily shown that the nth derivative of f(x ) is given by 


f (n \x) = sin (x + y ) . 


Therefore the remainder after expanding f(x) as an {n — l)th-order polynomial about 
x = 0 is given by 


x . / nn\ 

R„(x) = - sin ( C + y) , 


where i; lies in the range [0, x]. Since the modulus of the sine term is always less than or 
equal to unity, we can write |R„(x)| < |x"|/n!. For any particular value of x, say x = c, 
R n (c ) — > 0 as n — > oo. Hence Um n ^. rj R n (x) = 0, and so sinx can be represented by an 
infinite Maclaurin series. 

Evaluating the function and its derivatives at x = 0 we obtain 


/ (0) = sin 0 = 0, 

/'( 0) = sin(7t/2) = 1, 

/"( 0) = sin n = 0, 

/"'( 0) = sin(37r/2) = -1, 


and so on. Therefore, the Maclaurin series expansion of sin x is given by 


sin x = x — 



Note that, as expected, since sin x is an odd function, its power series expansion contains 
only odd powers of x. ◄ 
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We may follow a similar procedure to obtain a Taylor series about an arbitrary 
point x = a. 


► Expand f(x) = cosx as a Taylor series about x = re/3. 


As in the above example, it is easily shown that the nth derivative of f(x) is given by 

f {n \x) = cos (x + y) . 

Therefore the remainder after expanding f(x) as an (n — l)th-order polynomial about 
x = re/3 is given by 

(x — re/3)" /„ nn\ 

Rn{X) = nl C ° S V + T) ’ 

where £ lies in the range [re/3, x]. The modulus of the cosine term is always less than or 
equal to unity, and so |R„(x)| < |(x — re/3)"|/n!. As in the previous example, lim„_, a , R n (x) = 
0 for any particular value of x, and so cos x can be represented by an infinite Taylor series 
about x = re/3. 

Evaluating the function and its derivatives at x = re/3 we obtain 
/(re/3) = cos(re/3) = 1/2, 

/'(re/3) = cos(5re/6) = -y/3/2, 

/"(re/3) = cos(4re/3) = -1/2, 


and so on. Thus the Taylor series expansion of cosx about x = re/3 is given by 


cos x = 


1 

2 


2 


(x — re/3) — 


1 (x — re/3) 2 

2 2! 


+ ••• 


. ◄ 


4.6.2 Approximation errors in Taylor series 

In the previous subsection we saw how to represent a function /(x) by an infinite 
power series, which is exactly equal to f(x) for all x within the interval of 
convergence of the series. However, in physical problems we usually do not want 
to have to sum an infinite number of terms, but prefer to use only a finite number 
of terms in the Taylor series to approximate the function in some given range 
of x. In this case it is desirable to know what is the maximum possible error 
associated with the approximation. 

As given in (4.18), a function /(x) can be represented by a finite ( n — l)th-order 
power series together with a remainder term such that 

fix) = f(a ) + (x - a)f'(a) + /"(a) + • • • + f {n ~ l) (a) + R„(x), 

where 

nl 

and / lies in the range [a,x]. R„(x) is the remainder term, and represents the error 
in approximating f(x) by the above (n — l)th-order power series. Since the exact 
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value of £ that satisfies the expression for R„(x) is not known, an upper limit on 
the error may be found by differentiating R n (x) with respect to £ and equating 
the derivative to zero in the usual way for finding maxima. 


► Expand f(x) = cosx as a Taylor series about x = 0 and find the error associated with 
using the approximation to evaluate cos(0.5) if only the first two non-vanishing terms are 
taken. ( Note that the Taylor expansions of trigonometrical functions are only valid for 
angles measured in radians.) 


Evaluating the function and its derivatives at x = 0, we find 

/( 0) = cosO = 1, 

/'(0) = -sin0 = 0, 

/"(0) = — cosO = — 1, 

/"'( 0) = sinO = 0. 

So, for small |x|, we find from (4.18) 


Note that since cosx is an even function, its power series expansion contains only even 
powers of x. Therefore, in order to estimate the error in this approximation, we must 
consider the term in x 4 , which is the next in the series. The required derivative is / <4, (x) 
and this is (by chance) equal to cosx. Thus, adding in the remainder term Rfix), we find 

i x 2 x 4 f 

where £ lies in the range [0,x]. Thus, the maximum possible error is x 4 /4!, since cos£ 
cannot exceed unity. If x = 0.5, taking just the first two terms yields cos(0.5) « 0.875 with 
a predicted error of less than 0.00260. In fact cos(0.5) = 0.87758 to 5 decimal places. Thus, 
to this accuracy, the true error is 0.00258, an error of about 0.3%. ◄ 


4.6.3 Standard Maclaurin series 


It is often useful to have a readily available table of Maclaurin series for standard 
elementary functions, and therefore these are listed below. 


sin x = x — 


cosx = 1 


,-l 


X 3 


X 5 

X 7 

■ for 



+ 

— 

— 7T * * 

—oo < x < oo, 

3! 


5! 

7! 



X 2 


X 4 

X 6 

■ for 



+ 

— 

— 77 4“ * * ' 

—oo < x < oo. 

2! 


4! 

6! 



X 3 


X 5 

X 7 

■ for 

— 1 < X < 1, 


+ 


f- • • 

T 


5 

7 



2 3 4 

■v . S -"v -A 

e = 1+ X+- + - + - 
X 2 X 3 X 4 

ln(l + x) — x — — + — — — + ■ ■ 


■ • • for —oo < x < oo, 
for — 1 < x < 1, 

„3 


, X 


(1 + x) n = 1 + nx + n(n — 1)— + n(n — l)(n — 2)— + • 


for —oo < x < oo. 
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These can all be derived by straightforward application of Taylor’s theorem to 
the expansion of a function about x — 0. 


4.7 Evaluation of limits 

The idea of the limit of a function /(x) as x approaches a value a is fairly intuitive, 
though a strict definition exists and is stated below. In many cases, the limit of 
the function as x approaches a will be simply the value / ( a ), but in others this is 
not so. Firstly, the function may be undefined at x = a, as, for example, when 


f(x) = 


sinx 


which takes the value 0/0 at x = 0. Flowever, the limit as x approaches zero 
does exist and can be evaluated as unity using l’Hopital’s rule below. Another 
possibility is that even if /(x) is defined at x = a its value may not be equal to the 
limiting value lim x _ >a /(x). This can occur for a discontinuous function at a point 
of discontinuity. The strict definition of a limit is that if lim x ^ a / (x) = l then 
for any number e however small, it must be possible to find a number t] such that 
|/(x)— 1\ < e whenever \x—a\ < rj. In other words, as x becomes arbitrarily close to 
a, f(x) becomes arbitrarily close to its limit, /. To remove any ambiguity, it should 
be stated that, in general, the number p will depend on both e and the form of/(x). 

The following observations are often useful in finding the limit of a function. 

(i) A limit may be +oo. For example as x — » 0, 1/x 2 — ► oo. 

(ii) A limit may be approached from below or above and the value may be 
different in each case. For example consider the function /(x) = tanx. As x tends 
to n/2 from below /(x) — > oo, but if the limit is approached from above then 
/(x) — > —oo. Another way of writing this is 


lim tan x = oo. 


lim tan x = — oo. 

x->Z + 


(iii) It may ease the evaluation of limits if the function under consideration is 
split into a sum, product or quotient. Provided each of the limits exists, the rules 
for evaluating such limits are as follows. 

(a) lim {/(x) + g(x)} = lim /(x) + lim g(x). 

x — >ci x — x — 

(b) lim {/(x)g(x)} = lim/(x) limg(x). 

x — >ci x—>a x — 

< /(*) 


/(x) lim 
(c) lim = — 
x-*a g(x) lim 


provided that 


X^gW’ 

the numerator and denominator are 
not both equal to zero or infinity. 

Examples of cases (a)-(c) are discussed below. 
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► Evaluate the limits 



lim(x 2 + 2.x 3 ), 

lim(x cos x). 

sinx 
lim . 

X— »1 

x— >0 

x—*n/2 X 


Using (a) above, 

lim(.\' 2 + 2.x 3 ) = lim.x 2 + lim2x 3 = 3. 

X— »1 X — ► 1 X — »1 

Using (b), 

lim(x cos x) = lim x lim cos x = 0 x 1=0. 

x — »0 x— »0 x— >0 

Using (c), 

sinx liin MK/ 2 sin x 1 2 

lim = — = - = -.◄ 

x—*n/2 X lim A -_, K /2 x n / 2 71 


(iv) Limits of functions of x that contain exponents that themselves depend on 
x can often be found by taking logarithms. 


► Evaluate the limit 


lim 1 ■= 

X—HXJ V X 1 


Let us define 


and consider the logarithm of the required limit, i.e. 


y= i 


lim In y = lim x 2 In ( 1 — — ^ 

X — >00 X — XX) V 

Using the Maclaurin series for ln(l + x) given in subsection 4.6.3, we can expand the 
logarithm as a series and obtain 


lim In y = lim 


2x 4 


+ • 


Therefore, since In y = —a 2 it follows that liin t _ >cc y = exp (—a 2 ). ◄ 


(v) L’Hopital’s rule may be used; it is an extension of (iii)(c) above. In cases 
where both numerator and denominator are zero or both are infinite, further 
consideration of the limit must follow. Let us first consider lim A -_ >a /(.x)/g(x), 
where f(a) = g(a) = 0. Expanding the numerator and denominator as Taylor 
series we obtain 

f(x) = / ( a ) + (x - a)f(a) + [(x - a) 2 / 2 !]/"(fl) H 

g(x) g(a ) + (x - a)g'(a) + [(x - a) 2 /2\]g"(a) H ' 

However, f(a) = g(a) = 0 so 

fix ) = f'(a) + [(x - a)/ 2 !]/"(«) + • • • 
g(x) g'(a)+ [(x-n)/2!]g"(n)H ' 
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SERIES AND LIMITS 


Therefore we find 


finr M = m 

x ~+ a gM g'(a) 


provided f'(a) and g'(n) are not themselves both equal to zero. If, however, 
f'(a) and g'(a) are both zero then the same process can be applied to the ratio 
f'(x)/g'(x) to yield 


lim 


fix) 

g(x) 


f"(a) 

g"(ay 


provided that at least one of f"(a ) and g"(«) is non-zero. If the original limit does 
exist then it can be found by repeating the process as many times as is necessary 
for the ratio of corresponding nth derivatives not to be of the indeterminate form 
0/0, i.e. 

|im /M.fW 

g(x) gM(a) 


► Evaluate the limit 

sinx 
lim . 

x->0 X 


We first note that if x = 0, both numerator and denominator are zero. Thus we apply 
l'Hopital’s rule: differentiating, we obtain 

lim(sinx/x) = lim(cosx/l) = 1. ◄ 

x— »0 x— >0 


So far we have only considered the case where f(a) = g(a) = 0. For the case 
where f(a ) = g(a) = oo we may still apply THopital’s rule by writing 


lim 


/(*) 


“ g(x) 


lim 


VgW 

1 /fix)' 


which is now of the form 0/0 at x = a. Note also that THopital’s rule is still 
valid for finding limits as x — > oo, i.e. when a = oo. This is easily shown by letting 
y = l/.x as follows: 


x— >°o g(x) y >0 g ( 1 / y ) 


= lim 


-/'U/T)/r 


y _ 0 -g'(l/y)/y 2 

/'(l /y) 


= lim 

y ^o g'(l/y) 


lim 

x — >co 


fix) 
g'(x) ' 
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4.8 EXERCISES 


Summary of methods for evaluating limits 

To find the limit of a continuous function f(x) at a point x = a, simply substitute 
the value a into the function noting that £ = 0 and that ^=00. The only 
difficulty occurs when either of the expressions § or | results. In this case 
differentiate top and bottom and try again. Continue differentiating until the top 
and bottom limits are no longer both zero or both infinity. If the undetermined 
form 0 x 00 occurs then it can always be rewritten as § or 


4.1 

4.2 

4.3 


4.4 


4.5 


4.6 


4.7 


4.8 Exercises 

Sum the even numbers between 1000 and 2000 inclusive. 

If you invest £1000 on the first day of each year, and interest is paid at 5% on 
your balance at the end of each year, how much money do you have after 25 
years? 

How does the convergence of the series 

(n-r)l 
n ! 


depend on the integer r? 

Show that for testing the convergence of of the series 

x + y + x 2 + y 2 + x 3 +y 3 + ■■■ , 

where 0 < x < y < 1, the D'Alembert ratio test fails but the Cauchy root test is 
successful. 

Find the sum Sn of the first N terms of the following series, and hence determine 
whether the series are convergent, divergent, or oscillatory: 


(a) ]T In 

n= 1 



(b) ^(—2)", 

n = 0 


(c) E 


(— l)" +1 n 
3" 


By grouping and rearranging terms of the absolutely convergent series 


S = E 


1 


n 


2 ’ 


show that 


CO 

s-Ei 


3 S 

T' 


Use the difference method to sum the series 


E 


2n — 1 
2n 2 (n — 1 ) 2 
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SERIES AND LIMITS 


4.8 


4.9 


4.10 


The IV T 1 complex numbers co,„ are given by u>„, = exp(2nim/N ) for m = 
0,1,2,... ,N. 

(a) Evaluate the following: 

N N N 

(i) E“»> (ii) E m l , (iii) E®'"*'"' 

m=0 m = 0 m= 0 

(b) Use these results to evaluate 



Prove that 

cos 8 + cos (6 + a) + ■ • • + cos(0 + na) 


sin j(n + l)a 
sin 


cos (0 + a). 


Determine whether the following series converge (8 and p are positive real 
numbers) : 


(a) Y 

n= 1 


2 sin n6 
n(n + 1)’ 



(°) I] 


1 

2^/2’ 


4.11 


(d) E 

n=2 


(— l)"(n 2 + 1) 1/2 
n In n 


( e ) E 


n p 
n ! 


Find the real values of x for which the following are series convergent : 


(a) E^I’ ( b > E( sinx )"’ E” X ’ 


(d) E e “ E (ln ") x - 

n= 1 n=2 

4.12 Determine whether the following series are convergent: 

(a) E („ + 1)1/2’ (b) E„!’ (C) E „n/2 > (d) E„r 

n=l v 7 n=l n=l n=l 

4.13 Determine whether the following series are absolutely convergent, convergent or 
oscillatory: 

c>£3F. 


4.14 


(d) e 

n=0 


(-1)" 

n 2 + 3n + 2 ’ 


( e » E 


(— 1)"2" 

ip / 2 


Determine the positive values of x for which the following series converges : 


E 


x n / 2 e~ n 

n 
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4.8 EXERCISES 


4.15 Prove that 


00 


E ln 


n r + (-1)" 
n r 


is absolutely convergent for r = 2, but only conditionally convergent for r = 1. 

4.16 An extension to the proof of the integral test (subsection 4.3.2) shows that, if /(x) 
is positive, continuous and monotonically decreasing, for x > 1, and the series 
/( 1) + /(2) + • • • is convergent, then its sum does not exceed /( 1 ) + L, where L 
is the integral 


/ (x) dx. 


Use this result to show that the sum £(p) of the Riemann zeta series n with 
p > 1, is not greater than p/(p — 1). 

4.17 Demonstrate that rearranging the order of its terms can make a condition- 
ally convergent series converge to a different limit by considering the series 
l)" +1 n _1 = In 2 = 0.693. Rearrange the series as 

C — I 1 I i 1 1-1 L 4-... 

° 1 ' 3 2 ' 5 ' 7 4 ' 9 ' 11 6 ' 13 ' 

and group each set of three successive terms. Show that the series can then be 
written 


E 

m= 1 


8 m — 3 

2m(4m — 3 )(4m — 1 ) ’ 


which is convergent (by comparison with Y1 n ~ 2 ) and contains only positive 
terms. Evaluate the first of these and hence deduce that S is not equal to In 2. 
4.18 Illustrate result (iv) of section 4.4 about Cauchy products by considering the 
double summation 


S = EE 


i 

r 2 (n + 1 — r) 3 


By examining the points in the nr-plane over which the double summation is to 
be carried out, show that S can be written as 


S = EE 


i 

r 2 (n + 1 — r) 3 


Deduce that S < 3. 

4.19 A Fabry-Perot interferometer consists of two parallel heavily silvered glass plates; 

light enters normally to the plates, and undergoes repeated reflections between 
them, with a small transmitted fraction emerging at each reflection. Find the 
intensity |B| 2 of the emerging wave, where 


00 

B = A(\ -r)^Ve^, 

n=0 


with r and 4> real. 
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SERIES AND LIMITS 


4.20 


4.21 

4.22 

4.23 


4.24 

4.25 

4.26 


4.27 

4.28 


Identify the series 


E 


(-l)" +1 x 2n 
(2/7 — 1) ! ’ 


and then by integration and differentiation deduce the values S of the following 
series. 



(-l)" +1 n 2 
(2«)! ’ 
(— 1)" +1 /!7T 2 " 
4" ( 2/7 — 1)! ’ 


0»E 

n= 1 

oo 

<d»E 

n=0 


(-l)" +l n 

(2n+l)!’ 

( 1 )'■ ( 17 + 1 ) 
( 277 )! 


Starting from the Maclaurin series for cos x, show that 

?V 4 

(COSx)~ 2 = 1 + X 2 + 2r 1 . 

Deduce the first three terms in the Maclaurin series for tan x. 

Find the Maclaurin series for 

(b) (x 2 + 4) -1 , (c) sin 2 .x. 

If /(x) = sintT 1 x, and its nth derivative f M (x) is written as P„(x)/(1 + x 2 )" -1 ^ 2 , 
where P„(x) is a polynomial (of order n — 1), show that the P„(x) satisfy the 
recurrence relation 

E„+i(x) = (1 + x 2 )P'(x) - (2n - l)xP„(x). 

Hence generate the coefficients necessary to express sintT 1 x as a Maclaurin series 
up to terms in x 5 . 

Find the first three non-zero terms in the Maclaurin series for the following 
functions: 

(a) (x 2 + 9)~ 1/2 , (b) ln[(2 + x) 3 ], (c) exp(sinx), 

(d) ln(cosx), (e) exp[— (x — n)~ 2 ], (f) tan -1 x. 

By using the logarithmic series, prove that if a and b are positive and nearly 
equal then 

a 2 (a — b) 

In - ~ — . 

b a + b 

Show that the error in this approximation is about 2 (a — fe) 3 /[3(n + b) 3 ]. 
Determine whether the following functions /(x) are (i) continuous, and (ii) 
differentiable at x = 0 : 


(a) In 


1 + x 
1 — x 


(a) f (x) — exp( | x | ) ; 

(b) f(x) = (1 ^cosx)/.x 2 for x ^ 0, f(0) = 

(c) f(x) = xsin(l/x) for x y= 0, f( 0) = 0; 

(d) f(x) = [4 — x 2 ], where [v] denotes the integer part of y. 


Find the limit as x — > 0 of [^1 + x m — *Jl — x m ]/x", in which m and n are positive 
integers. 

Evaluate the following limits: 


(a) lim 


sin 3x 


o sinh x ’ 


, . tan x — tanh x 

(b) lim — — , 

*->o sinh x — x 


(c) lim 


tan x — x 
o cos x — 1 ’ 


(d) lim 


/ cosec x 


o V 




sinh x \ 
~ )■ 
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4.8 EXERCISES 


4.29 


4.30 

4.31 


4.32 


4.33 


4.34 


Find the limits of the following functions: 

x 3 + x 1 2 — 5x — 2 
2x 3 — 7x 2 + 4x + 4 ’ 

, sin x — x cosh x 

(b) 


(a) 


as x — > 0, x — > oo and x — > 2 ; 


sinh x — x 

r n/2 


0; 


(c) 


y cos v — sin y 
'' 


dy, 


0. 


Use Taylor expansions to three terms to find approximations to (a) 4 ^T7, and 

(b) 3 V26. 

Using a first-order Taylor expansion about x = x 0 , show that a better approxi- 
mation than xo to the solution of the equation 

/ (x) = sin x + tan x = 2 


is given by x = x 0 + h , where 

, 2 — /(x 0 ) 

cos xo + sec- xo 

(a) Use this procedure twice to find the solution of /(.x) = 2 to six significant 
figures, given that it is close to x = 0.9. 

(b) Use the result in (a) to deduce, to the same degree of accuracy, one solution 
of the quartic equation 

y 4 - 4y 3 + 4y 2 + 4y - 4 = 0. 


Evaluate 

r 1 ( 1 

*— >0 Lx 3 V * 6 ). 

In quantum theory, a system of oscillators, each of fundamental frequency v, 
interacting at temperature T has an average energy E given by 

p = EEo nhve- nx 
E"=o e_ “ ’ 

where x = hv/kT , h and k being the Planck and Boltzmann constants respectively. 
Prove that both series converge, evaluate their sums, and show that at high 
temperatures E « kT whilst at low temperatures E » hv exp (—hv/kT). 

In a very simple model of a crystal, point-like atomic ions are regularly spaced 
along an infinite one-dimensional row with spacing R. Alternate ions carry equal 
and opposite charges +e. The potential energy of the ith ion in the electric field 
due to the y'th ion is 

mi 

4ne 0 rij’ 

where q * is the charge on the kth ion and r,j is the distance between the ith and 
yth ions. 

Write down a series giving the total contribution V, of the ith ion to the overall 
potential energy . Show that the series converges, and, if V t is written as 


1 47160 R 

find a closed-form expression for a, the Madelung constant for this (unrealistic) 

lattice. 
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SERIES AND LIMITS 


4.35 One of the factors contributing to the high relative permittivity of water to static 
electric fields is the permanent electric dipole moment p of the water molecule. In 
an external field E the dipoles tend to line up with the field, but they do not do 
so completely because of thermal agitation at the temperature T of the water. A 
classical (non-quantum) calculation using the Boltzmann distribution shows that 
the average polarisability per molecule a. is given by 

a. = -^-(cothx — x -1 ), 

where x = pE/kT and k is the Boltzmann constant. 

At ordinary temperatures, even with high field strengths (10 4 Vm -1 or more), 
x C 1. By making suitable series expansions of the hyperbolic functions involved, 
show that a. = p 2 /3kT to an accuracy of about one part in 15x -2 . 

4.36 In quantum theory a certain method (the Born approximation) gives the (so- 
called) amplitude f(9) for the scattering of a particle of mass m through an angle 
9 by a uniform potential well of depth Vo and radius b (i.e. the potential energy 
of the particle is —V 0 within a sphere of radius b and zero elsewhere) as 

2 fflVn 

f{9 ) = L__(sinRi> — Kb cos Kb). 

Here h is the Planck constant divided by 2 n, the energy of the particle is h 2 k 2 /2m 
and K is 2fcsin(0/2). 

Use l’Hopital’s rule to evaluate the amplitude at low energies, i.e. when k and 
hence K tend to zero, and so determine the low-energy total cross-section. 
(Note: the differential cross-section is given by \f(9)\ 2 and the total cross-section 
by the integral of this over all solid angles, i.e. 2n fg |/(0)| 2 sin0 dd.) 


4.9 Hints and answers 

4.1 2E500 0 ' 1 = 75150 °- 

4.2 Ar(r" — l)/(r — 1) = £50 113 . 

4.3 Divergent for r < 1 ; convergent for r >2. 

4.4 The ratio of successive terms oscillates between 0 and 00 as n — » 00 ; u n < 

( v"/ 2 ) 1 /" < 1. 

4.5 (a) ln(IV + 1), divergent; (b) [1 — (— 2)"]/3, oscillates infinitely; (c) Add Sjv/3 to 
the S N series; j|[l — (— 3) -JV ] + |lV(— 3) -JV_1 , convergent to 

4.6 Write all terms of the form (2m) -2 as \m~ 2 ; their sum is clearly jS. 

4.7 (l-AT 2 )/2. 

4.8 (a) (i) 2 for N = 1; 1 otherwise, (ii) 2 for N = 1; 3 for N = 2; 1 otherwise. 

(iii) 1 + x for N = 1; [1 — x N+1 exp(27d/lV)]/[l — xexp(27ti/AT)] otherwise. 

(b) (i) Consider Re(co m — 1 » 2 ); —2 for N = 2; 0 otherwise, (ii) Consider Im(2"‘co„,); 

-V3- 

4.9 Sum the geometric series with rth term exp[/(0 + roc)]. Its real part is 

{cosd — cos [( n + l)a + 0] — cos(0 — a) + cos(0 + wx)} /4sin 2 (a/2), 
which can be reduced to the given answer. 

4.10 (a) Convergent, compare with n 1 (u + l) -1 ; (b) convergent, ratio test; (c) 
divergent, compare with f 1 ', (d) convergent, alternating signs; (e) convergent, 
ratio test. 
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4.9 HINTS AND ANSWERS 


4.11 


4.12 

4.13 

4.14 

4.15 


4.17 

4.18 


4.19 

4.20 


4.21 


4.22 

4.23 

4.24 


4.25 

4.26 

4.27 

4.28 

4.29 

4.30 

4.31 


4.32 

4.33 

4.34 
4.36 


(a) — 1 < x < 1; (b) all x except x = (2 n + l)7t/2; (c) x < — 1; (d) x < 0; (e) 
always divergent. Clearly divergent for x > — 1. For —X = x < —1, consider 

oo M k i 

55 55 (In M k ) x ’ 

k = 1 n=M k _i + 1 ' K ’ 

where In M k = k and note that M k — M k - 1 = e~ l (e — 1 )M k ; hence show that the 
series diverges. 

(a) Divergent, u„ does not tend to 0. (b) Convergent, ratio test, (c) Convergent, 
root test, (d) Divergent, ratio tends to e, or u n does not tend to 0. 

(a) Absolutely convergent, compare with exercise 4.10(b). (b) Oscillates infinitely, 
(c) Absolutely convergent for all x. (d) Absolutely convergent; use partial frac- 
tions. (e) Oscillates infinitely, 
x < e 2 , by the root test. 

Divide the series into two series, n odd and n even. For r = 2 both are absolutely 
convergent, by comparison with n~ 2 . For r = 1 neither series is convergent, 
by comparison with However, the sum of the two is convergent, by the 

alternating sign test or by showing that the terms cancel in pairs. 

The first term has value 0.833 and all other terms are positive. 

The original summation ran along lines parallel to the r-axis; replace it with 
one running along lines parallel to the ra-axis. Write n + 1 — r = s and deduce 
that S = C(2)C(3), where ((p) is the Riemann zeta function Use the result 

proved in exercise 4.16 to give the stated conclusion. 

|H| 2 (1 — r) 2 /(l + r 2 — 2rcos<j>). 
x sin x. 


(a) Differentiate once; set x = 1. S = (sin 1 + cos l)/4 = 0.345. 

(b) Integrate once; set x = 1. S = (sin 1 — cos l)/2 = 0.151. 

(c) Differentiate once; set x = 7t/2. S = n/4 = 0.785. 

(d) Differentiate twice; set s = n — 1 and x = 1. S = (2 cos 1 — sin l)/2 = 0.120. 


Use the binomial expansion and collect terms up to x 4 . Integrate both sides of 
the displayed equation, tanx = x + x 3 /3 + 2x 5 /15 + • • • . 

, 2 „ J2. (— l)" +1 (2x) 2 " 


n odd n= 0 n= 1 


2(2/i)! 


For example, -Ps(x) = 24x 4 — 72.x 2 + 9. sinh 1 x = x — x 3 /6 + 3x s /40 — ■ ■ ■ . 

(a) [1 - (x 2 /18) - (3 x 4 /648)]/3. (b) In 8 + 3x/2 - 3x 2 /8. (c) 1 + x + x 2 /2. (d) 

— x 2 /2 — x 4 /12 — x 6 /45. (e) exp(— a 2 ){ 1 — 2x/a 3 — x 2 [(3/n 4 ) — (2/a 6 )]}, (f) x — 
x 3 /3 + x 5 /5. 

Set a = D + S and b = D — S and use the expansion for ln(l + S/D). 

(i) (a), (b) and (c) are continuous, (ii) Only (b) is differentiable. 

The limit is 0 for m > n, 1 for m = n, and oo for m < n. 

(a) 3, (b) 4, (c) 0, (d) ij. 

(a) — 5, 5, oo; (b) -4; (c) -1 + 2/7t. 

(a) Expand /(x) = x 1/4 about xo = 16; approximation 2.030518, actual 2.030543. 

(b) Expand /(x) = x l/3 about x 0 = 27; approximation 2.962506, actual 2.962496. 

(a) First approximation 0.886452; second approximation 0.886287. (b) Set y = 
sinx and re-express /(.x) = 2 as a polynomial equation, y = sin(0.886287) = 
0.774730. 

7/360. 

E = hv[exp(hv /kT) — 1] _1 . 
a = —2 In 2. 


f(9) = 2mVob 3 /3h 2 (i.e. independent of 6); 4n(2mVob 3 /3h 2 ) 2 . 
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5 


Partial differentiation 


In chapter 2, we discussed functions / of only one variable x, which were usually 
written f(x). Certain constants and parameters may also have appeared in the 
definition of /, e.g. f(x) = ax + 2 contains the constant 2 and the parameter a, but 
only x was considered as a variable and only the derivatives f ln> (x) = d n f /dx n 
were defined. 

However, we may equally well consider functions that depend on more than one 
variable, e.g. the function f(x,y) = x 2 + 3 xy, which depends on the two variables 
x and y. For any pair of values x,y, the function f(x,y) has a well-defined value, 
e.g. /( 2,3) = 22. This notion can clearly be extended to functions dependent on 
more than two variables. For the n-variable case, we write f(x i,X2,...,x„) for 
a function that depends on the variables xi,X2, ...,x„. When n = 2, xi and X 2 
correspond to the variables x and y used above. 

Functions of one variable, like /(x), can be represented by a graph on a 
plane sheet of paper, and it is apparent that functions of two variables can, 
with little effort, be represented by a surface in three-dimensional space. Thus, 
we may also picture f(x,y) as describing the variation of height with position 
in a mountainous landscape. Functions of many variables, however, are usually 
very difficult to visualise and so the preliminary discussion in this chapter will 
concentrate on functions of just two variables. 


5.1 Definition of the partial derivative 

It is clear that a function f(x,y) of two variables will have a gradient in all 
directions in the xy-plane. A general expression for this rate of change can be 
found and will be discussed in the next section. However, we first consider the 
simpler case of finding the rate of change of f(x,y) in the positive x- and y- 
directions. These rates of change are called the partial derivatives with respect 
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5.1 DEFINITION OF THE PARTIAL DERIVATIVE 


to x and y respectively, and they are extremely important in a wide range of 
physical applications. 

For a function of two variables f(x, y ) we may define the derivative with respect 
to x, for example, by saying that it is that for a one-variable function when y is 
held fixed and treated as a constant. To signify that a derivative is with respect 
to x, but at the same time to recognize that a derivative with respect to y also 
exists, the former is denoted by 8f /8x and is the partial derivative of f(x,y) with 
respect to x. Similarly, the partial derivative of / with respect to y is denoted by 
df/dy. 

To define formally the partial derivative of f(x,y ) with respect to x, we have 


5/ = Um /(x + A.x,y)-/(x,y) 
dx Ax— >o Ax 


(5.1) 


provided that the limit exists. This is much the same as for the derivative of a 
one-variable function. The other partial derivative of f(x,y ) is similarly defined 
as a limit (provided it exists): 


8_f_ = f(x,y + Ay)-/(x,y) 
8y Ay— Ay 


(5.2) 


It is common practice in connection with partial derivatives of functions 
involving more than one variable to indicate those variables that are held constant 
by writing them as subscripts to the derivative symbol. Thus, the partial derivatives 
defined in (5.1) and (5.2) would be written respectively as 


y- and 



X 


In this form, the subscript shows explicitly which variable is to be kept constant. 
A more compact notation for these partial derivatives is f x and f y . However, it is 
extremely important when using partial derivatives to remember which variables 
are being held constant and it is wise to write out the partial derivative in explicit 
form if there is any possibility of confusion. 

The extension of the definitions (5.1), (5.2) to the general n-variable case is 
straightforward and can be formally written as 

df(x i,x 2 ,...,x„) = [/(xi,x 2 ,...,x,- + Ax, x„) -/(x 1 ,x 2 ,...,x i ,...,x„)] 

dxj Ax,~ >o Ax; 


provided that the limit exists. 

Just as for one-variable functions, second (and higher) partial derivatives may 
be defined in a similar way. For a two-variable function /(x,y) they are 


d_ 

dx 

d_ 

dx 


d 2 f 

dx 2 

d 2 f 

dxdy 


d_ 

Ty 

d_ 

dy 


= /* 


— fxy> a.. a.. — 


a 2 / 

dy 2 

d 2 f 

dydx 


= fy. 


= /. 
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PARTIAL DIFFERENTIATION 


Only three of the second derivatives are independent since the relation 

d 2 f d 2 f 
dxdy dydx’ 

is always obeyed, provided that the second partial derivatives are continuous 
at the point in question. This relation often proves useful as a labour-saving 
device when evaluating second partial derivatives. It can also be shown that for 
a function of n variables, f(xi,X 2 , ...,x n ), under the same conditions, 

d 2 f _ 8 2 f 

dxjdxj dxjdxi 


>-Find the first and second partial derivatives of the function 

f{x,y) = 2 x 3 y 2 +y\ 


The first partial derivatives are 

8f 


df 


dx 

and the second partial derivatives are 

the last two being equal, as expected. ◄ 


= 6x z yf -f- = 4x 3 y + 3 y 




d '> = 12* ff = 12* 


dxdy 


dydx 


5.2 The total differential and total derivative 

Having defined the (first) partial derivatives of a function f(x,y), which give the 
rate of change of / along the positive x- and y-axes, we consider next the rate of 
change of f(x,y) in an arbitrary direction. Suppose that we make simultaneous 
small changes Ax in x and Ay in y and that, as a result, / changes to / + A/. 
Then we must have 


A/ = f(x + Ax, y + Ay) - f(x, y) 


/ (x + Ax, y + Ay) - f(x, y + Ay) + f(x, y + Ay) - f(x, y) 

7(* + Ax, y + Ay) - f(x, y + Ay)' 

Ax + 

7(x,y + Ay) — f(x,y) 

Ax 

[ Ay J 


(5.3) 


In the last line we note that the quantities in brackets are very similar to those 
involved in the definitions of partial derivatives (5.1), (5.2). For them to be strictly 
equal to the partial derivatives, Ax and Ay would need to be infinitesimally small. 
But even for finite (but not too large) Ax and Ay the approximate formula 


A / 


Sf{x,y) 

dx 


Ax + 


df(x,y) 

8y 


Ay, 


(5.4) 
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5.2 THE TOTAL DIFFERENTIAL AND TOTAL DERIVATIVE 


can be obtained. It will be noticed that the first bracket in (5.3) actually approxi- 
mates to df(x,y + A y)/dx but that this has been replaced by 8f(x,y)/dx in (5.4). 
This approximation clearly has the same degree of validity as that which replaces 
the bracket by the partial derivative. 

How valid an approximation (5.4) is to (5.3) depends not only on how small 
Ax and Ay are but also on the magnitudes of higher partial derivatives; this is 
discussed further in section 5.7 in the context of Taylor series for functions of 
more than one variable. Nevertheless, letting the small changes Ax and Ay in 
(5.4) become infinitesimal, we can define the total differential df of the function 
f(x,y), without any approximation, as 

df — °J-dx + C -J-dy. (5.5) 

ox dy 

Equation (5.5) can be extended to the case of a function of n variables, 

f(x i,x 2 ,...,x„); 

df = — dx\ T — dx 2 T • • • T „ — dx n . (5.6) 

cxi ox 2 dx n 


>-Find the total differential of the function f(x,y) = yexp(x + y). 
Evaluating the first partial derivatives, we find 

df , , df , , 

— = y exp(x + y), — = exp(x + y) + y exp(x + y). 
ox dy 

Applying (5.5), we then find that the total differential is given by 

df = [y exp(x + y)]dx + [( 1 + y) exp(x + y)\dy. ◄ 


In some situations, despite the fact that several variables Xj, i = 1,2 ,...,«, 
appear to be involved, effectively only one of them is. This occurs if there are 
subsidiary relationships constraining all the x,- to have values dependent on the 
value of one of them, say xi- These relationships may be represented by equations 
that are typically of the form 

Xi = Xj(xi), i = 2, 3, ...,«. (5.7) 

In principle / can then be expressed as a function of xi alone by substituting 
from (5.7) for X2,X3,...,x„, and then the total derivative (or simply the derivative) 
of / with respect to xi is obtained by ordinary differentiation. 

Alternatively, (5.6) can be used to give 

df df f df \ dxi ( df \ dx n 

d^-^ + {d^)d^ + ''' + \d^)d^- ( ] 

It should be noted that the LHS of this equation is the total derivative df /dx i, 
whilst the partial derivative df /dx i forms only a part of the RHS. In evaluating 
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this partial derivative account must be taken only of explicit appearances of xi in 
the function /, and no allowance must be made for the knowledge that changing 
X] necessarily changes xi, X 3 , . . . , x„. The contribution from these latter changes is 
precisely that of the remaining terms on the RHS of (5.8). Naturally, what has 
been shown using xi in the above argument applies equally well to any other of 
the Xi, with the appropriate consequent changes. 


►Find the total derivative of f(x,y ) = x 2 + 3 xy with respect to x, given that y = sin 1 x. 


We can see immediately that 


df 8f dy 

_ = 2x + 3y, — = 3x, Tx = j^ 


and so, using (5.8) with xi = x and x 2 = y. 


1 


d £=2x + 3y + 3x {l _ x2)1/2 


= 2x + 3 sin 1 x + 


3x 


(l_ x 2)t/2- 


1 


Obviously the same expression would have resulted if we had substituted for y from the 
start, but the above method often produces results with reduced calculation, particularly 
in more complicated examples. ◄ 


5.3 Exact and inexact differentials 

In the last section we discussed how to find the total differential of a function, i.e. 
its infinitesimal change in an arbitrary direction, in terms of its gradients df / dx 
and df /dy in the x- and y- directions (see (5.5)). Sometimes, however, we wish 
to reverse the process and find the function / that differentiates to give a known 
differential. Usually, finding such functions relies on inspection and experience. 

As an example, it is easy to see that the function whose differential is df = 
xdy + y dx is simply f(x,y) = xy + c, where c is a constant. Differentials such as 
this, which integrate directly, are called exact differentials, whereas those that do 
not are inexact differentials. For example, x dy + 3 y dx is not the straightforward 
differential of any function (see below). Inexact differentials can be made exact, 
however, by multiplying through by a suitable function called an integrating 
factor. This is discussed further in subsection 14.2.3. 


► Show that the differential xdy + 3y dx is inexact. 


On the one hand, if we integrate with respect to x we conclude that f(x,y) = 3 xy + g(y), 
where g(y) is any function of y. On the other hand, if we integrate with respect to y we 
conclude that f(x,y) = xy + h(x) where h(x) is any function of x. These conclusions are 
inconsistent for any and every choice of g(y) and h(x), and therefore the differential is 
inexact. ◄ 


158 





5.3 EXACT AND INEXACT DIFFERENTIALS 


It is naturally of interest to investigate which properties of a differential make 
it exact. Consider the general differential containing two variables, 

df — A(x,y)dx + B(x,y)dy. 


We see that 


df 


= A(x,y), 


df 


= B(x,y) 


ox dy 

and, using the property f xy = f yx , we therefore require 

8A _ 8B 
dy dx 


(5.9) 


This is in fact both a necessary and a sufficient condition for the differential to 
be exact. 


► Using (5.9) show that x dy + 3y dx is inexact. 

In the above notation, A(x,y) = 3 y and B(x,y) = x and so 

dA 8B_ 

dy ’ dx 

As these are not equal it follows that the differential is inexact. ◄ 

Determining whether a differential containing many variable xi,X2,...,x n is 
exact is a simple extension of the above. A differential containing many variables 
can be written in general as 

n 

df = 'y]g i (xux 2 ,... 9 x n )dxi 

i=i 

and will be exact if 

for all pairs i,j. (5.10) 

OXj V Xf 

There will be \n(n — 1) such relationships to be satisfied. 


►S/jow that 


is an exact differential. 


(y + z) dx + xdy + x dz 


In this case, gffx,y,z) = y + z, g 2 (x,y,z) = x, g 3 (x,y,z) = x and hence 8g\/dy = 1 = 
dg 2 /dx, 8gi/dx = 1 = dgi/dz, dg 2 /dz = 0 = 8g 2 /dy\ therefore, from (5.10), the differential 
is exact. As mentioned above, it is sometimes possible to show that a differential is exact 
simply by finding by inspection the function from which it originates. In this example, it 
can be seen easily that f(x,y,z) = x(y + z) + c. ◄ 
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5.4 Useful theorems of partial differentiation 


So far our discussion has centred on a function f(x, y) dependent on two variables, 
x and y. Equally, however, we could have expressed x as a function of / and y, 
or y as a function of / and x. To emphasise the point that all the variables are 
of equal standing, we now replace / by z. This does not imply that x, y and z 
are coordinate positions (though they might be). Since x is a function of y and z, 
it follows that 


*-(£), 


dy + ( — dz 


and similarly, since y = y(x,z), 

dy = 

We may now substitute (5.12) into (5.11) to obtain 


dx ■ 


dz. 


[dy 


S /\ dx 

OX . 


/ dx 


<3v\ ( 8x 


dz. 


(5.11) 


(5.12) 


(5.13) 


Now if we hold z constant, so that dz = 0, we obtain the reciprocity relation 


-l 


which holds provided both partial derivatives exist and neither is equal to zero. 
Note, further, that this relationship only holds when the variable being kept 
constant, in this case z, is the same on both sides of the equation. 

Alternatively we can put dx = 0 in (5.13). Then the contents of the square 
brackets also equal zero, and we obtain the cyclic relation 

' dz' 


-) =-l 
dxJAdy } 


which holds unless any of the derivatives vanish. In deriving this result we have 
used the reciprocity relation to replace (dx/dz)~ l by (dz/dx) y . 


5.5 The chain rule 

So far we have discussed the differentiation of a function f(x,y) with respect to 
its variables x and y. We now consider the case where x and y are themselves 
functions of another variable, say u. If we wish to find the derivative df / du, 
we could simply substitute in f(x,y) the expressions for x(u) and y(u) and then 
differentiate the resulting function of u. Such substitution will quickly give the 
desired answer in simple cases, but in more complicated examples it is easier to 
make use of the total differentials described in the previous section. 
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From equation (5.5) the total differential of f(x,y) is given by 


,, 8f df 

dj = — ax + — dy , 

dx dy 

but we now note that by using the formal device of dividing through by du this 
immediately implies 

df _ df dx + df dy 

du dx du dy du’ 

which is called the chain rule for partial differentiation. This expression provides 
a direct method for calculating the total derivative of / with respect to u and is 
particularly useful when an equation is expressed in a parametric form. 


(5.14) 


► Giren that x(u) = 1 + au and y(u ) = bid , find the rate of change of f(x,y ) = xe y with 
respect to u. 


As discussed above, this problem could be addressed by substituting for x and y to obtain 
/ as a function only of u and then differentiating with respect to u. However, using (5.14) 
directly we obtain 

= (e~ y )a + (—xe~ y )3 bu 2 , 
du 

which on substituting for x and y gives 

= e -6 " 3 (a — 3bir — 3 ban 3 ). ◄ 
du 


Equation (5.14) is an example of the chain rule for a function of two variables 
each of which depends on a single variable. The chain rule may be extended to 
functions of many variables, each of which is itself a function of a variable u, i.e. 
f(x i,X2,X3,...,x n ), with Xj = Xj(u). In this case the chain rule gives 


df df dxi df dxi df dx 2 vj ax„ 

dll f ^ r) v : du i du du r) v.. du 


i= 1 


dxi du dx 1 du dx 2 du 


df dx n 
dx„ du 


(5.15) 


5.6 Change of variables 

It is sometimes necessary or desirable to make a change of variables during the 
course of an analysis, and consequently to have to change an equation expressed 
in one set of variables into an equation using another set. The same situation arises 
if a function / depends on one set of variables Xj, so that / = f(x 1 , X 2 , . . . , x n ), 
but the Xi are given in terms of a further set of variables Uj by the equations 

Xj = Xi(u\,u 2 ,...,u m ). (5.16) 

The Xj on the right of this equation is a function (of the uf) whilst the x, on the 
left is the value of that function. For each different value of i, the x,- on the right 
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Figure 5.1 The relationship between Cartesian and plane cylindrical polar 
coordinates. 


will be a different function of the itj. In this case the chain rule (5.15) becomes 


81 

8iij 


E 


8f dxj 
dxj 8uj ’ 


j = 1,2 


(5.17) 


and is said to express a change of variables. In general the number of variables 
in each set need not be equal, i.e. m need not equal n, but if both the x,- and the 
Uj are sets of independent variables then m = n. 


► Plane polar coordinates, p and <j), and Cartesian coordinates, x and y, are related by the 
expressions 

x = p cos (j>, y = p sin (f>, 

as can be seen from figure 5.1. An arbitrary function f(x,y) can be re-expressed as a 
function g(p,<j>). Transform the expression 

8x 2 8y 2 

into one in p and <f>. 


We first note that p 2 = x 2 + y 2 , <p = tan l (y/x). We can now write down the four partial 
derivatives 


8p_ _ 

x , 

8(j> 

8x 

( X 2 + y2 ) l,2 - C ° S ^ 

8x 

8p 

y • , 

8(j> 

8y 

— I r— = Sin 0, 

c X 2 + y 2 ) l/2 

fy 


Thus, from (5.17), we may write 


-iy/x 2 ) 

1 + (y/x) 2 
1 /x 

i + (y/x ) 2 


sin <j> 
P 

COS (f> 

p 


8 _ 

8x 


= COS (j) 


8 _ 

8p 


sin <j) 8 
P 


8_ 

8y 



+ 


cos cj) 8 
P Stf 
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Now it is only a matter of writing 


d 2 f 

8x 2 


8 

8x 


8_ 

8x 


.8 sin 4> 8 

co&4>- — 

Op p 0(p 

. d sin 6 d 

cos <t>- — 

op p dtp 


COS (j) 


COS (j> 


d 2 g 

dp 2 


c °s^T3 + 


, sin~ (j> 8g 
p dp 


2 cos sin <fi dg 
p 2 8<j> 
sin 2 < t > d 2 g 


d<f> 2 


d sin <f> 8 
Tp~~M) g 
8g_ _ sin </> dg \ 
dp p 8(j) ) 

2 cos (j> sin (j> d 2 g 
p dtpdp 


and a similar expression for 8 2 f /dy 2 , 


d 2 f 

dy 2 


8 cos <j> d 
dp p d4> 


. , d cos d> 8 

sm0— H — 

dp p dtp 


. 2 i d 2 g 2 cos <j> sin <f> dg 2 cos <j> sin cj> 8 2 g 

Sm dp 2 p 2 d(j>+ p 8cf>dp 

cos 2 4> dg cos 2 (j> d 2 g 


dp 


84> 2 


When these two expressions are added together the change of variables is complete and 
we obtain 


5V d^l = <Pg Idg 1 Pg 
dx 2 dy 2 dp 2 p dp p 2 dcj> 2 ' 


5.7 Taylor’s theorem for many-variable functions 

We have already introduced Taylor’s theorem for a function f(x) of one variable, 
in section 4.6. In an analogous way, the Taylor expansion of a function f(x,y) of 
two variables is given by 


f(x,y) = f(xo,yo) 


81 

dx 

1 

2! 


A.x 


8f 

8y 


A y 


ox z cxoy 




+ • 


(5.18) 


where A.x = x — xo and Ay = y — yo, and all the derivatives are to be evaluated 
at (x 0 ,yo). 
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► Find the Taylor expansion, up to quadratic terms in x — 2 and y— 3, of /(x, y) = y exp icy 
about the point x = 2, y = 3. 

We first evaluate the required partial derivatives of the function, i.e. 

df 2 5/ 

— = y exp xy, — = exp xy + xy exp xy, 


8 2 f 3 

d^ =y exp ^’ 


8 2 f 2 

= Lx exp xy + x y exp xy, 


d 2 f 

dxdy 


= 2 y exp xy + xy 2 exp xy. 


Using (5.18), the Taylor expansion of a two-variable function, we find 
f(x,y)*e 6 {3 + 9(x-2) + l(y-3) 

+(2!)- 1 [27(x - 2) 2 + 48 (x - 2)(y - 3) + 16(y - 3) 2 ] }. 


It will be noticed that the terms in (5.18) containing first derivatives can be 
written as 

U-Ax + d J-A y = (ax^- + A y-^-) f(x,y), 
ex ay \ ox ay ) 

where both sides of this relation should be evaluated at the point (xq, >'o). Similarly 
the terms in (5.18) containing second derivatives can be written as 


1 

2 ! 




2 ! 


~7 Axf- 


dx 


■A y^- 
dy 


f(x,y), 

(5.19) 


where it is understood that the partial derivatives resulting from squaring the 
expression in parentheses act only on f{x,y ) and its derivatives, and not on Ax 
or Ay; again both sides of (5.19) should be evaluated at (xo,yo)- It can be shown 
that the higher-order terms of the Taylor expansion of f(x,y) can be written in 
an analogous way, and that we may write the full Taylor series as 

00 . 

f(x, y) = 

' nl 

n=0 

where, as indicated, all the terms on the RHS are to be evaluated at (xo,yo)- 
The most general form of Taylor’s theorem, for a function f(x i,X2,...,x„) of n 
variables, is a simple extension of the above. Although it is not necessary to do 
so, we may think of the x,- as coordinates in n-dimensional space and write the 
function as /(x), where x is a vector from the origin to (xi,X2, ...,x„). Taylors 


Ax— +Ay— f(x.y) 


dx 


dy 
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theorem then becomes 


/(x) = /(xo) + £ 


dxi 


Ax,' 


1 

2 ! 


EE 


d 2 f 

dxidx 


Ax, A xj ■ 


(5.20) 


where Ax,- = x, — x , 0 and the partial derivatives are evaluated at (xi 0 ,X2 0 ,...,x„ 0 ). 
For completeness, we note that in this case the full Taylor series can be written 
in the form 

00 « 

n=0 

where V is the vector differential operator del, to be discussed in chapter 10. 


5.8 Stationary values of many-variable functions 

The idea of the stationary points of a function of just one variable has already 
been discussed in subsection 2.1.8. We recall that the function /(x) has a stationary 
point at x = xo if its gradient df /dx is zero at that point. A function may have 
any number of stationary points, and their nature, i.e. whether they are maxima, 
minima or stationary points of inflection, is determined by the value of the second 
derivative at the point. A stationary point is 

(i) a minimum if d 2 f /dx 2 > 0; 

(ii) a maximum if d 2 f /dx 2 < 0; 

(iii) a stationary point of inflection if d 2 f /dx 2 — 0 and changes sign through 
the point. 

We now consider the stationary points of functions of more than one variable; 
we will see that partial differential analysis is ideally suited to the determination 
of the position and nature of such points. It is helpful to consider first the case 
of a function of just two variables but, even in this case, the general situation 
is more complex than that for a function of one variable, as can be seen from 
figure 5.2. 

This figure shows part of a three-dimensional model of a function f(x,y). At 
positions P and B there are a peak and a bowl respectively or, more mathemati- 
cally, a local maximum and a local minimum. At position S the gradient in any 
direction is zero but the situation is complicated, since a section parallel to the 
plane x = 0 would show a maximum, but one parallel to the plane y = 0 would 
show a minimum. A point such as S is known as a saddle point. The orientation 
of the ‘saddle’ in the xy-plane is irrelevant; it is as shown in the figure solely for 
ease of discussion. For any saddle point the function increases in some directions 
away from the point but decreases in other directions. 


165 



PARTIAL DIFFERENTIATION 


S 



Figure 5.2 Stationary points of a function of two variables. A minimum 
occurs at B, a maximum at P and a saddle point at S. 


For functions of two variables, such as the one shown, it should be clear that a 
necessary condition for a stationary point (maximum, minimum or saddle point) 
to occur is that 


d l- 0 and 



(5.21) 


The vanishing of the partial derivatives in directions parallel to the axes is enough 
to ensure that the partial derivative in any arbitrary direction is also zero. The 
latter can be considered as the superposition of two contributions, one along 
each axis; since both contributions are zero, so is the partial derivative in the 
arbitrary direction. This may be made more precise by considering the total 
differential 

df = -J-dx + -J-dy. 


Using (5.21) we see that although the infinitesimal changes dx and dy can be 
chosen independently the change in the value of the infinitesimal function df is 
always zero at a stationary point. 

We now turn our attention to determining the nature of a stationary point of 
a function of two variables, i.e. whether it is a maximum, a minimum or a saddle 
point. By analogy with the one-variable case we see that 8 2 f /dx 2 and d 2 f/dy 2 
must both be positive for a minimum and both be negative for a maximum. 
However these are not sufficient conditions since they could also be obeyed at 
complicated saddle points. What is important for a minimum (or maximum) is 
that the second partial derivative must be positive (or negative) in all directions, 
not just the x- and y- directions. 
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To establish just what constitutes sufficient conditions we first note that, since 
/ is a function of two variables and df/dx = 8f /8y = 0, a Taylor expansion of 
the type (5.18) about the stationary point yields 

f(x,y) - /Go, To) » yy [(A xffxx + 2AxAyf xy + (A y) 2 f yy ] , 

where Ax = x — xo and Ay = y — yo and where the partial derivatives have been 
written in more compact notation. Rearranging the contents of the bracket as 
the weighted sum of two squares, we find 


f(x,y) — /Go, To) 


1 

2 


f XX ( 


fxyAy 


+ (Ay) 2 



f 2 \ 

J_xy_ \ 
fxx) 


(5.22) 


For a minimum, we require (5.22) to be positive for all Ax and Ay, and hence 
fxx > 0 and f yy — {fly/ fxx) > 0 - Given the first constraint, the second can be 
written fxxfyy > fly ■ Similarly for a maximum we require (5.22) to be negative, 
and hence f xx < 0 and fxxfyy > fl r For minima and maxima, symmetry requires 
that fyy obeys the same criteria as f xx . When (5.22) is negative (or zero) for some 
values of Ax and Ay but positive (or zero) for others, we have a saddle point. In 
this case fxxfyy < fly ■ In summary, all stationary points have f x = f y = 0 and 
they may be classified further as 


(i) minima if both f xx and f yy are positive and f/ y < fxxfyy, 

(ii) maxima if both f xx and f yy are negative and f/ y < fxxfyy, 

(iii) saddle points if f xx and f yy have opposite signs or f\ y > fxxfyy- 

Note, however, that if f/ y = fxxfyy then /(x, y) — /(xo,yo) can be written in one 
of the four forms 

±^(Ax|/ x // 2 ±Ay|/ y // 2 ) 2 . 

For some choice of the ratio Ay /Ax this expression has zero value showing 
that, for a displacement from the stationary point in this particular direction, 
f(x o + Ax,yo + Ay) does not differ from /(xo, yo) to second order in Ax and 
Ay; in such situations further investigation is required. In particular, if f xx , f yy 
and f X y are all zero then the Taylor expansion has to be taken to a higher 
order. As examples, such extended investigations would show that the function 
/(x,y) = x 4 + y 4 has a mimimum at the origin but that g(x,y) = x 4 + y 3 has a 
saddle point there. 
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► S/iow that the function f(x,y) = x 3 exp(— x 2 — y 2 ) has a maximum at the point ( -^/ 3 /2, 0), 
a minimum at (~sj 3/2 , 0) and a stationary point at the origin whose nature cannot be 
determined by the above procedures. 


Setting the first two partial derivatives to zero to locate the stationary points, we find 

% = (3x 2 — 2x 4 )exp(— x 2 — y 2 ) = 0, (5.23) 

ox 

^ = -2yx } exp(— x 2 - y 2 ) = 0. (5.24) 

For (5.24) to be satisfied we require x = 0 or y = 0 and for (5.23) to be satisfied we require 
x = 0 or x = +-^3/2. Flence the stationary points are at (0,0), (^3/2,0) and (—a/3/2,0). 
We now find the second partial derivatives : 

fxx = (4x 5 - 14x 3 + 6x) exp(— x 2 - y 2 ) 
f yy = x 3 (4y 2 - 2) exp(— x 2 - y 2 ) 
fxy = 2x-y(2x 2 - 3) exp(— x 2 - y 2 ). 

We then substitute the pairs of values of x and y for each stationary point and find that 
at (0,0) 

fxx = 0 , fyy = 0 , f xy = 0 

and at (+^3/2, 0) 

fxx = +6a/3/2 exp(— 3/2), f„ = +3y/J/2 exp(-3/2), f xy = 0. 

Flence, applying criteria (i)-(iii) above, we find that (0,0) is an undetermined stationary 
point, (^3/2,0) is a maximum and (—^3/2, 0) is a minimum. The function is shown in 
figure 5.3. ◄ 

Determining the nature of stationary points for functions of a general number 
of variables is considerably more difficult and requires a knowledge of the 
eigenvectors and eigenvalues of matrices. Although these are not discussed until 
chapter 8, we present the analysis here for completeness. The remainder of this 
section can therefore be omitted on a first reading. 

For a function of n real variables, f(x i, *2, •••,*«)> we require that, at all 
stationary points, 

C ^- = 0 for all Xj. 

OXi 

In order to determine the nature of a stationary point, we must expand the 
function as a Taylor series about the point. Recalling the Taylor expansion (5.20) 
for a function of n variables, we see that 

i p2 f 

A f = /W - /(x o) » ^ / v f.v VviAA > (5 ' 25) 

i j ' 1 
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- 0.2 


-0.4 


Figure 5.3 The function f(x,y) = x 3 exp(— x 2 — y 2 ). 


If we define the matrix M to have elements given by 


M u = 


d 2 f 

dxidxj ’ 


then we can rewrite (5.25) as 


A / = ±Ax T MAx, (5.26) 

where Ax is the column vector with the Ax,- as its components and Ax T is its 
transpose. Since M is real and symmetric it has n real eigenvalues k r and n 
orthogonal eigenvectors e r , which after suitable normalisation satisfy 

M6 r — k r G r , gJ 6 lS . = 3 rs , 

where the Kronecker delta, written d rs , equals unity for r — s and equals zero 
otherwise. These eigenvectors form a basis set for the n-dimensional space and 
we can therefore expand Ax in terms of them, obtaining 

Ax = a r e r , 

r 
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where the a r are coefficients dependent upon Ax. Substituting this into (5.26), we 
find 


A / = f Ax T MAx = i A r aj. 

r 


Now, for the stationary point to be a minimum, we require A f = \ Yl r / T«, 2 > 0 
for all sets of values of the a r , and therefore all the eigenvalues of M to be 
greater than zero. Conversely, for a maximum we require A f = \ A, -a, 2 < 0, 
and therefore all the eigenvalues of M to be less than zero. If the eigenvalues have 
mixed signs, then we have a saddle point. Note that the test may fail if some or 
all of the eigenvalues are equal to zero and all the non-zero ones have the same 
sign. 


► Derive the conditions for maxima, minima and saddle points for a function of two real 
variables, using the above analysis. 


For a two-variable function the matrix M is given by 


M = 


J xx f xy 
fyx fyy 


Therefore its eigenvalues satisfy the equation 


fxx - A 
fxy 


fxy 

fyy ^ 


= 0 . 


Hence 

(fxx-Wyy-V~f 2 xy = 0 
=* fxx fyy — ( fxx + fyy)^ + A 2 — f~ y = 0 


=> 2A = (f xx + f yy ) + ^(fxx+fyyr-lifxxfyy-fly), 
which by rearrangement of the terms under the square root gives 

2A = (f XX + fyy) + ^(. fxx-fyy) 2 +4p xy . 

Now, that M is real and symmetric implies that its eigenvalues are real, and so for both 
eigenvalues to be positive (corresponding to a minimum), we require f xx and f yy positive 
and also 

fxx + fyy > {fxx + fyv) 2 — 4(/ xxf yv ~ fly), 

=> fxxfyy - fly > 0. 

A similar procedure will find the criteria for maxima and saddle points. ◄ 


5.9 Stationary values under constraints 

In the previous section we looked at the problem of finding stationary values of 
a function of two or more variables when all the variables may be independently 
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varied. However, it is often the case in physical problems that not all the vari- 
ables used to describe a situation are in fact independent, i.e. some relationship 
between the variables must be satisfied. For example, if we walk through a hilly 
landscape and we are constrained to walk along a path, we will never reach 
the highest peak on the landscape, unless the path happens to take us to it. 
Nevertheless, we can still find the highest point that we have reached during our 
journey. 

We first discuss the case of a function of just two variables. Let us consider 
finding the maximum value of the differentiable function f(x,y) subject to the 
constraint g{x,y) = c, where c is a constant. In the above analogy, f(x,y) might 
represent the height of the land above sea-level in some hilly region, whilst 
g(.x, y) = c is the equation of the path along which we walk. 

We could, of course, use the constraint g(x, y) = c to substitute for x or y in 
f(x,y), thereby obtaining a new function of only one variable whose stationary 
points could be found using the methods discussed in subsection 2.1.8. However, 
such a procedure can involve a lot of algebra and becomes very tedious for func- 
tions of more than two variables. A more direct method for solving such problems 
is the method of Lagrange undetermined multipliers, which we now discuss. 

To maximise / we require 

df = C -J-dx + ^J-dy — 0. 
ox dy 

If dx and dy were independent, we could conclude f x = 0 = f y . However, here 
they are not independent, but constrained because g is constant: 

dg = dx + ^ dy = 0. 
ox oy 

Multiplying dg by an as yet unknown number X and adding it to df we obtain 


d(f + x g ) = ( i I dx 

fx \J . 


T + aU-o, 

dy dy J 


where X is called a Lagrange undetermined multiplier. In this equation dx and dy 
are to be independent and arbitrary; we must therefore choose X such that 


^ + 4 = 0 , 


(5.27) 


TT + fr" 1 

oy oy 


(5.28) 


These equations, together with the constraint g(x, y) = c, are sufficient to find the 
three unknowns, i.e. X and the values of x and y at the stationary point. 
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► The temperature of a point (x,y) on a unit circle is given by T(x,y ) = 1 + xy. Find the 
temperature of the two hottest points on the circle. 


We need to maximise T(x,y) subject to the constraint x 2 + y 2 = 1. Applying (5.27) and 
(5.28), we obtain 


y + 2Xx = 0, (5.29) 

x + 2 Xy = 0. (5.30) 


These results, together with the original constraint x 2 +y 2 = 1, provide three simultaneous 
equations that may be solved for X, x and y. 

From (5.29) and (5.30) we find X = +1/2, which in turn implies that y = +x. Remem- 
bering that x 2 + y 2 = 1, we find that 


y = x => x = +—- , 

“V2 

' ± V2 

1 

1 

y = - - x = +Ti , 

Vs 

II 

1+ 

-bil 


We have not yet determined which of these stationary points are maxima and which are 
minima. In this simple case, we need only substitute the four pairs of x- and y- values into 
T(x,y) = 1 +xy to find that the maximum temperature on the unit circle is T max = 3/2 at 
the points y = x = +1/^2- ◄ 


The method of Lagrange multipliers can be used to find the stationary points of 
functions of more than two variables, subject to several constraints, provided that 
the number of constraints is smaller than the number of variables. For example, 
if we wish to find the stationary points of f(x,y,z) subject to the constraints 
g(x,y,z ) = ci and h(x,y,z ) = C 2 , where ci and c 2 are constants, then we proceed 
as above, obtaining 


fad + Xg + Fh) 
— (/ + kg + nh) 
y z d + Xg + fih) 


g-g 

g-g 


8h 

»Tx = 
8h 

dTT = 
8y 

8h 

»Tz = 


0, 

0, 

0. 


( 5 . 31 ) 


We may now solve these three equations, together with the two constraints, to 
give X, ji, x, y and z. 
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► Find the stationary points of f(x,y,z ) = x 3 + y 3 + z 3 subject to the following constraints : 

(i) g(x, y,z) = x 2 + y 2 + z 2 = 1; 

( ii ) g(x, y, z) = x 2 + y 2 + z 2 = 1 and h{x, y,z) = x + y + z = 0. 


Case (i). Since there is only one constraint in this case, we need only introduce a single 
Lagrange multiplier to obtain 


+ ^g) = 3x 2 + 22x = 0, 

-^-(/ + 2g) = 3y 2 + 22y = 0, (5.32) 

8y 

^(/ + 2g) = 3z 2 + 22z = 0. 


These equations are highly symmetrical and clearly have the solution x = y = z = —22/3. 
Using the constraint x 2 + y 2 + z 2 = 1 we End X = +^/3/2 and so stationary points occur 
at 


x = y = z 



(5.33) 


In solving the three equations (5.32) in this way, however, we have implicitly assumed 
that x, y and z are non-zero. However, it is clear from (5.32) that any of these values can 
equal zero, with the exception of the case x = y = z = 0 since this is prohibited by the 
constraint x 2 + y 1 + z 2 = 1. We must consider the other cases separately. 

If x = 0, for example, we require 


3y 2 + 2Xy = 0, 
3z 2 + 22z = 0, 
y 2 + z 2 = 1. 


Clearly, we require X =f 0, otherwise these equations are inconsistent. If neither y nor 
z is zero we find y = —22/3 = z and from the third equation we require y = z = 
+1/^2. If y = 0, however, then z = +1 and, similarly, if z = 0 then y = +1. Thus the 
stationary points having x = 0 are (0, 0, +1), (0, +1,0) and (0, +1/^/2, +1/^/2). A similar 
procedure can be followed for the cases y = 0 and z = 0 respectively and, in addition 
to those already obtained, we find the stationary points (+1,0,0), ( + 1 A/2, 0, +1/^/2) and 

(+1/V2, +1/V2, 0). 

Case (ii). We now have two constraints and must therefore introduce two Lagrange 
multipliers to obtain (cf. (5.31)) 


8 

j^(f + 2g + ph) = 3x 2 + 22x + p = 0, 

(5.34) 

^-1/ + 2g + ph) = 3y 2 + 2 2y + p = 0, 

8y 

(5.35) 

+ Xg + ph) = 3z 2 + 2Xz + p = 0. 

OZ 

(5.36) 


These equations are again highly symmetrical and the simplest way to proceed is to 
subtract (5.35) from (5.34) to obtain 

3(x 2 — y 2 ) + 22(x — y) = 0 

=> 3(x + y)(x-y) + 22(x-y) = 0. (5.37) 


This equation is clearly satisfied if x = y ; then, from the second constraint, x + y + z = 0, 
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we find z = —2.x. Substituting these values into the first constraint, x 2 + y 2 + z 2 = 1, we 
obtain 


x 





(5.38) 


Because of the high degree of symmetry amongst the equations (5.34) (5.36), we may obtain 
by inspection two further relations analogous to (5.37), one containing the variables y,z 
and the other the variables x,z. Assuming y = z in the first relation and x = z in the 
second, we find the stationary points 


and 





(5.39) 



" ± V6' 



(5.40) 


We note that in finding the stationary points (5.38) — (5.40) we did not need to evaluate the 
Lagrange multipliers X and fi explicitly. This is not always the case, however, and in some 
problems it may be simpler to begin by finding the values of these multipliers. 

Returning to (5.37) we must now consider the case where x =f= y; then we find 


3(x + y) + IX = 0. 


(5.41) 


Flowever, in obtaining the stationary points (5.39), (5.40), we did not assume x = y but 
only required y = z and x = z respectively. It is clear that x y at these stationary points, 
and it can be shown that they do indeed satisfy (5.41). Similarly, several stationary points 
for which x =£ z or y z have already been found. 

Thus we need to consider further only two cases: (a) x = y = z, and (b) x, y and z are 
all different. The first is clearly prohibited by the constraint x + y + z = 0. For the second 
case, (5.41) must be satisfied, together with the analogous equations containing y,z and 
x, z respectively, i.e. 


3(x + y) + IX = 0, 

3( v + z) + 2X = 0, 

3(x + z) + 2X = 0. 

Adding these three equations together and using the constraint x + y+z = 0 we find 2 = 0. 
Flowever, for X = 0 the equations are inconsistent for non-zero x, y and z. Therefore all 
the stationary points have already been found and are given by (5.38)-(5.40). ◄ 

The method may be extended to functions of any number n of variables 
subject to any smaller number m of constraints. This means that effectively there 
are n — m independent variables and, as mentioned above, we could solve by 
substitution and then by the methods of the previous section. However, for large 
n this becomes cumbersome and the use of Lagrange undetermined multipliers is 
a useful simplification. 
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>-A system contains a very large number N of particles, each of which can be in any of R 
energy levels with a corresponding energy i = 1,2,..., R. The number of particles in the 
ith level is n t and the total energy of the system is a constant, E. Find the distribution of 
particles amongst the energy levels that maximises the expression 

P= Nl , 
n\ \n 2 ! • • • n R ! 

subject to the constraints that both the number of particles and the total energy remain 
constant, i.e. 

R R 

g = N — n, = 0 and h = E — ^ n,£, = 0. 

i= l ;=l 


The way in which we proceed is as follows. In order to maximise P, we must minimise 
its denominator (since the numerator is fixed). Minimising the denominator is the same as 
minimising the logarithm of the denominator, i.e. 

/ = In (m \n 2 ! ■ • -n R \) = \n(m !) + ln(n 2 !) T + ln(n R !). 

Using Stirling's approximation, In (n !) « n In n — n, we find that 

/ = m In m + n 2 In n 2 -\ + n R In n R — (m +n 2 -\ + n R ) 

R \ 

iij In n, ) — N. 

i= 1 / 

It has been assumed here that, for the desired distribution, all the n, are large. Thus, we 
now have a function / subject to two constraints, g = 0 and h = 0, and we can apply the 
Lagrange method, obtaining (cf. (5.31)) 

on i cn\ cn\ 

<H _ , i , u 0b_ 

dn 2 dn 2 dn 2 


8f ,8 g 8h 

+2.-^- +u— = 0. 
dn R 8n R h 8n R 


= 0, 
= 0, 



Since all these equations are alike, we consider the general case 

8 f +: d 8 _i_ ii 8h - o 
on k on k on^ 

for k = 1,2, Substituting the functions /, g and h into this relation we find 

nu 

h In n^ + 2( — 1) + — Ek) = 0, 

iik 


which can be rearranged to give 


In iik = pEk + X — 1, 


and hence 


iik = C exp pEk- 
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We now have the general form for the distribution of particles amongst energy levels, but 
in order to determine the two constants p, C we recall that 

R 

C exp fiE k = N 
k = 1 


y CE k exp fiE k = E. 

k = 1 

This is known as the Boltzmann distribution and is a well-known result from statistical 
mechanics. ◄ 


5.10 Envelopes 

As noted at the start of this chapter, many of the functions with which the 
physicists, chemists and engineers have to deal contain, in addition to constants 
and one or more variables, quantities that are normally considered as parameters 
of the system under study. Such parameters may, for example, represent the 
capacitance of a capacitor, the length of a rod, or the mass of a particle - 
quantities that are normally taken as fixed for any particular physical set-up. 
The corresponding variables may well be time, currents, charges, positions and 
velocities. However, the parameters could be varied and in this section we study 
the effects of doing so; in particular we study how the form of dependence of 
one variable on another, typically y = y(x), is affected when the value of a 
parameter is changed in a smooth and continuous way. In effect, we are making 
the parameter into an additional variable. 

As a particular parameter, which we denote by a, is varied over its permitted 
range, the shape of the plot of y against x will change, usually, but not always, 
in a smooth and continuous way. For example, if the muzzle speed v of a shell 
fired from a gun is increased through a range of values then its height-distance 
trajectories will be a series of curves with a common starting point that are 
essentially just magnified copies of the original; furthermore the curves do not 
cross each other. However, if the muzzle speed is kept constant but d, the angle 
of elevation of the gun, is increased through a series of values, the corresponding 
trajectories do not vary in a monotonic way. When 6 has been increased beyond 
45° the resulting trajectory does cross some of the trajectories corresponding 
to 6 < 45°. The trajectories all lie within a curve that touches each individual 
trajectory at one point. Such a curve is called the envelope to the set of trajectory 
solutions; it is to the study of such envelopes that this section is devoted. 

For our general discussion of envelopes we will consider an equation of the 
form / = /(.x, y, a) = 0. A function of three Cartesian variables, / = f(x, y, a), 
is defined at all points in xya-space, whereas / = f(x, y, a) = 0 is a surface in 
this space. A plane of constant a, which is parallel to the xy-plane, cuts such 
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f(x,y,a i) = 0 f(x,y,ai + h) = 0 

Figure 5.4 Two neighbouring curves in the xy-plane of the family f(x,y, a) = 

0 intersecting at P. For fixed a i, the point Pi is the limiting position of P as 
h — > 0. As ai is varied. Pi delineates the envelope of the family (broken line). 


a surface in a curve. Thus different values of the parameter a correspond to 
different curves, which can be plotted in the xy-plane. We now investigate how 
the envelope equation for such a family of curves is obtained. 


5.10.1 Envelope equations 

Suppose /(x,y, ai) = 0 and f(x,y, + h) = 0 are two neighbouring curves of a 
family for which the parameter a differs by a small amount h. Let them intersect 
at the point P with coordinates x,y, as shown in figure 5.4. Then the envelope, 
indicated by the broken line in the figure, touches f{x,y, oq) = 0 at the point Pi, 
which is defined as the limiting position of P when a i is fixed but h — > 0. The 
full envelope is the curve traced out by P i as a.\ changes to generate successive 
members of the family of curves. Of course, for any finite h, /(x,y, a i + h) = 0 is 
one of these curves and the envelope touches it at the point P?. 

We are now going to apply Rolle’s theorem, see subsection 2.1.10, with the 
parameter a as the independent variable and x and y fixed as constants. In this 
context, the two curves in figure 5.4 can be thought of as the projections onto the 
xy-plane of the planar curves in which the surface f = /(x, y, a) = 0 meets the 
planes a = ai and a = ai + h. 

Along the normal to the page that passes through P, as a changes from a\ 
to ai + h the value of / = /(x, y, a) will depart from zero, because the normal 
meets the surface / = /(x, y, a) = 0 only at a = ai and at a = ai + h. However, 
at these end points the values of f — f (x, y, a) will both be zero, and therefore 
equal. This allows us to apply Rolle’s theorem and so to conclude that for some 
0 in the range 0 < 6 < 1 the partial derivative df{x, y, oq + Oh)/ Sen is zero. When 
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h is made arbitrarily small, so that P — * Pi, the three dehning equations reduce 
to two and define the envelope point P i : 

f(x,y, ai) = 0 and d/(x,j,a i) = (5.42) 

da 

In (5.42) both the function and the gradient are evaluated at a = oq. The equation 
of the envelope g (x,y) = 0 is found by eliminating oq between the two equations. 

As a simple example we will now solve the problem which when posed mathe- 
matically reads ‘calculate the envelope appropriate to the family of straight lines 
in the xy-plane whose points of intersection with the coordinate axes are a fixed 
distance apart’. In more ordinary language, the problem is about a ladder leaning 
against a wall. 


►4 ladder of length L can be stood on level ground and leant at any angle against a vertical 
wall. Find the equation of the curve bounding the vertical area that can be accessed from 
the ladder. 


We take the ground and the wall as the x- and y-axes respectively. If the foot of the ladder 
is a from the foot of the wall and the top is b above the ground, the straight-line equation 
of the ladder is 



= 1, 


where a and b are connected by a 2 + b 2 = L 2 . Expressed in standard form with only one 
independent parameter, a, the equation becomes 


f ( x >y^=- a + { L2- y a 2y/2 


1 = 0 . 


(5.43) 


Now, differentiating (5.43) with respect to a and setting the derivative df /da equal to 
zero gives 

^ =0 . 

a 2 ( L 2 - a 2 ) 3 / 2 ’ 

from which it follows that 


Lx >/ 3 

(.■*2/3 + j,2/3)1/2 


and 


(L 2 - a 2 ) 1 ' 2 = 


Ly i/ 3 

(.*2/3+y2/3)l/2- 


Eliminating a by substituting these values into (5.43) gives, for the equation of the 
envelope of all possible positions on the ladder, 

X 2/3 +y 2/3 =L 2^ 


This is the equation of an astroid (mentioned in exercise 2.19), and, together with the wall 
and the ground, marks the boundary of the vertical area that can be accessed by (the 
shoes of) a person standing on the ladder. ◄ 


Other examples, drawn from both geometry and and the physical sciences, are 
considered in the exercises at the end of this chapter. The shell trajectory problem 
discussed earlier in this section is solved there, but in the guise of a question 
about the water bell of an ornamental fountain. 
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5.11 Thermodynamic relations 

Thermodynamic relations provide a useful set of physical examples of partial 
differentiation. The relations we will derive are called Maxwell’s thermodynamic 
relations. They express relationships between four thermodynamic quantities de- 
scribing a unit mass of a substance. The quantities are the pressure P, the volume 
V, the thermodynamic temperature T and the entropy S of the substance. These 
four quantities are not independent; any two of them can be varied indepen- 
dently, but the other two are then determined. The first law of thermodynamics 
may be expressed as 

dU — T dS — P dV, (5.44) 

where U is the internal energy of the substance. Essentially this is a conservation 
of energy equation, but we shall concern ourselves, not with the physics, but rather 
with the use of partial differentials to relate the four basic quantities discussed 
above. The method involves writing a total differential, dU say, in terms of the 
differentials of two variables, say X and Y, thus 

du ~(^)/ x+ {w) x dY - ,145 > 

and then using the relationship 

8 2 U _ 8 2 U 
8X8Y ~ 8YdX 

to obtain the required Maxwell relation. The variables X and Y are to be chosen 
from P, V, T and S. 


►S/jow that (c IT /8V)s = —(8P/8S) V - 


Here the two variables that have to be held constant, in turn, happen to be those whose 
differentials appear on the RHS of (5.44). And so, taking lasS and Y as V in (5.45), we 
have 

TdS-PdV = dU = (^ ) dS+( C 4~] dV, 

\8Sj v \8V J s 

and find directly that 


8U 

~8S 


v 


= T 


and 



= -P. 


Differentiating the first expression with respect to V and the second with respect to S, and 
using 

8 2 U _ 8 2 U 
8V8S ~ 8S8V ’ 

we find the Maxwell relation 


8T 

8V 


s 



. ◄ 


V 
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►S/iovv that ( 8S/8V)t = (8P /8T)y. 


Applying (5.45) to dS, with independent variables V and T, we find 


dU = T dS - P dV = T 


Similarly applying (5.45) to dU, we find 


8S \ JT7 ( 8S 
] dV + ( — - 


8V 


8T 


dT 


-PdV. 


dU = ( dV+ ( C ^) dT. 


8V 


Thus, equating partial derivatives, 


8U\ ( 8S . 

= T — -P 


But, since 


8V T Vc’F 


8 2 U _ 8 2 U 
8T8V ~ 8V8T ’ 


8T 


and I S') -US') . 


8T 


8T ) | 


8 ( 8U 


8 ( 8U' 

8T V<3F J T ~ 8V [STy v 


it follows that 


5S A 8 2 S 

8V ) T 8T8V 


8P_ 
~8T 

Thus finally we get the Maxwell relation 


8 

8V 


8S 

8T 


= T- 


8 2 S 

8V8T' 


8S\ _ / 8P\ 

dv) T ~{df) v - 


The above derivation is rather cumbersome, however, and a useful trick that 
can simplify the working is to define a new function, called a potential. The 
internal energy U discussed above is one example of a potential but three others 
are commonly defined and they are described below. 


► Show that (8S/8V) T = ( 8P/8T) V by considering the potential U — ST. 


We first consider the differential d(U — ST). From (5.5), we obtain 
d(U — ST) = dU — SdT - TdS = -SdT - PdV 

when use is made of (5.44). We rewrite U — ST as F for convenience of notation; F is 
called the Helmholtz potential. Thus 

dF = -SdT -PdV, 


and it follows that 



Using these results together with 

8 2 F _ 8 2 F 
8T8V ~ 8V8T' 

we can see immediately that 



which is the same Maxwell relation as before. ◄ 
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Although the Helmholtz potential has other uses, in this context it has simply 
provided a means for a quick derivation of the Maxwell relation. The other 
Maxwell relations can be derived similarly by using two other potentials, the 
enthalpy, H = U + PV, and the Gibbs free energy, G = U + PV — ST (see 
exercise 5.25). 


5.12 Differentiation of integrals 

We conclude this chapter with a discussion of the differentiation of integrals. Let 
us consider the indefinite integral (cf. equation (2.30)) 


F(x, t) = / f(x, t ) dt. 


from which it follows immediately that 

5F(x, t) 
dt 


= f(x, t). 


Assuming that the second partial derivatives of F(x,t ) are continuous, we have 

d 2 F(x,t ) d 2 F(x,t) 


dtdx 


dxdt 


and so we can write 


8 

8F(x, t) 

d 

8F(x, t ) 

_ df(x, t) 

dt 

dx 

dx 

dt 

dx 


Integrating this equation with respect to t then gives 

8F{x, t) f 8f(x, t) 


8x J dx 

Now consider the definite integral 


dt. 


1 (*) = / fix, t) dt 


(5.46) 


= F(x,v) — F{x,u), 


where u and v are constants. Differentiating this integral with respect to x, and 
using (5.46), we see that 


dl (.x) 
dx 


dF(x, v) 8F(x,u ) 


dx 

v df{x, t) 
dx 

v df(x, t) 
dx 


dx 


dt - 
dt. 


r “ df(x,t ) 
dx 


dt 


This is Leibnitz' rule for differentiating integrals, and basically it states that for 
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constant limits of integration the order of integration and differentiation can be 
reversed. 

In the more general case where the limits of the integral are themselves functions 
of x, it follows immediately that 

rt=v(x ) 

f (x) = / f(x,t)dt 

J t=u(x) 

= F(x, u(x)) — F(x, u(x)). 


which yields the partial derivatives 


Consequently 


dl 

dv 


f(x,v(x)), 


8J_ 

8u 


— /(x, m(x)). 


ell 

dx 


/ dl \ dv / 8I\ du dl 

\dv J dx \du ) dx dx 

dv du 

f(x,v(x))~ /(x,m(x))— + 

dx dx 

dv du 

/(x, r(x))— -f(x,u(x ) )— + 


d_ 

dx 


Mx) 


f (x, t)dt 


u(x) 


Mx) 


' m(jc) 


dfj x, t) 
dx 


dt, 


(5.47) 


where the partial derivative with respect to x in the last term has been taken 
inside the integral sign using (5.46). This procedure is valid because u(x) and v(x) 
are being held constant in this term. 



Applying (5.47), we see that 


dl 

dx 


sin x 3 „ sin x 2 , /'' 2 t cos xt , 

— 2-(2x) 1 + / — dt 

X 2 X J x t 

2 sin x 3 sin x 2 

h 

X X 

, sin x 3 sin x 2 

3 2 

x x 

1 , , 

-(3 sinx — 2sinx ). ◄ 
x 


sin xt 
x 


5.13 Exercises 

5.1 (a) Find all the first partial derivatives of the following functions /(x, y): (i) x 2 y, 

(ii) x 2 + y 2 + 4, (iii) sin(x/y), (iv) tan _1 (y/x), (v) r(x,y,z) = (x 2 +y 2 + z 2 ) 1/2 . 
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5.2 


5.3 


5.4 


5.5 


5.6 


5.7 


5.8 


(b) For (i), (ii) and (v), find 8 2 f /dx 2 ,d 2 f /8y 2 ,8 2 f /dxdy. 

(c) For (iv) verify that 8 2 f /dxdy = d 2 f /dydx. 
Determine which of the following are exact differentials : 

(a) (3.x + 2)y dx + x(x + 1) dy, 

(b) y tan .x dx + x tan y dy, 

(c) y 2 (ln x + 1) dx + 2 xy In .x dy, 

(d) y 2 (ln x + 1 ) dy + 2xy In .x dx, 

(e) [.x/(x 2 + y 2 )] dy — [y/(x 2 + y 2 )] dx. 

Show that the differential 


df = x 2 dy — (y 2 + xy)dx 

is not exact, but that dg = ( xy 2 )~ l df is exact. 

(a) Show that 

df = y(l + x — x 2 )dx + x(x + 1) dy 


is not an exact differential. 

(b) Find the differential equation that a function g(x) must satisfy if df> = g (x)df 
is to be an exact differential. Verify that g(x) = e~ x is a solution of this 
equation and deduce the form of f>(x,y). 

The equation 3 y = z 3 + 3 xz defines z implicitly as a function of x and y. Evaluate 
all three second partial derivatives of z with respect to x and/or y. Verify that z 
is a solution of 


j/z fa 

df + fa 


A possible equation of state for a gas takes the form 


pF = Rrexp(-^), 


in which a and R are constants. Calculate expressions for 


dfa (dv_\ (dT_\ 

8V ) T ' \8T ) / \dp) v ' 

and show that their product is —1, as stated in section 5.4. 

The function G(t) is defined by 

G(t) = F(x,y) = x 2 + y 2 + 3xy, 

where x(f) = at 2 and y(t) = 2at. Use the chain rule to find the values of (x,y) at 
which G(f) has stationary values as a function of f. Do any of them correspond 
to the stationary points of F(x,y ) as a function of x and y? 

In the xy-plane, new coordinates s and t are defined by 

s = |(x + y), r=i(x-y). 

Transform the equation 

d 2 (j> 8 2 4> _ 

dx 2 dy 2 

into the new coordinates and deduce that its general solution can be written 


<t>(x,y) = f(x + y) + g(x-y), 

where f(u) and g(v) are arbitrary functions of u and v respectively. 
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5.9 

5.10 

5.11 

5.12 

5.13 

5.14 

5.15 

5.16 

5.17 


5.18 

5.19 


The function f{x,y) satisfies the differential equation 

4 + 4 - o. 

ox 8y 

By changing to new variables u = x 2 — y 2 and v = 2 xy, show that / is, in fact, a 
function of x 2 — y 2 only. 

If x = e" cos 8 and y = e" sin 9, show that 

d 2 tj) 8 2 4> 


8 2 4> d 2 (!> , 2 2 (d 2 f 8 2 f\ 

d^ + W~ (x +y > {d^ + 8f)’ 


where f(x,y ) = 

Find and evaluate the maxima, minima and saddle points of the function 


Show that 


f(x,y) = xy(x 2 + y 2 - 1). 


f(x,y) = x 3 — 12xy + 48.x + by 2 , 


0 , 


has two, one, or zero stationary points according to whether \b\ is less than, equal 
to, or greater than 3. 

Locate the stationary points of the function 

f(x,y) = (x 2 - 2y 2 )exp[— (x 2 + y 2 )/a 2 ]. 


where a is a non-zero constant. 

Sketch the function along the x- and y- axes and hence identify the nature and 
values of the stationary points. 

Find the stationary points of the function 

f(x,y) = x 3 + xy 2 — 12x — y 2 

and identify their nature. 

Find the stationary values of 

f{x,y) = 4x 2 + 4y 2 + x 4 - 6x 2 jr + y 4 


and classify them as maxima, minima or saddle points. Make a rough sketch of 
the contours of / in the quarter plane x, y > 0. 

The temperature of a point (x,y,z) on the unit sphere is given by 


T(x, y, z ) = 1 + xy + yz. 

By using the method of Lagrange multipliers find the temperature of the hottest 
point on the sphere. 

A rectangular parallelepiped has all eight vertices on the ellipsoid 

x 2 + 3y 2 + 3z 2 = 1. 

Using the symmetry of the parallelepiped about each of the planes x = 0, 
y = 0, z = 0, write down the surface area of the parallelepiped in terms of 
the coordinates of the vertex that lies in the octant x,y,z > 0. Hence find the 
maximum value of the surface area of such a parallelepiped. 

Two horizontal corridors, 0 < x < a with y > 0, and 0 < y < b with x > 0, meet 
at right angles. Find the length L of the longest ladder (considered as a stick) 
that may be carried horizontally around the corner. 

A barn is to be constructed with a uniform cross-sectional area A throughout 
its length. The cross-section is to be a rectangle of wall height h (fixed) and 
width w, surmounted by an isosceles triangular roof that makes an angle 9 with 
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the horizontal. The cost of construction is oc per unit height of wall and /? per 
unit (slope) length of roof. Show that, irrespective of the values of a and [i, to 
minimise costs w should be chosen to satisfy the equation 

w 4 = 16A(A — wh ), 

and 9 made such that 2 tan 2 9 = w/h. 

5.20 Show that the envelope of all concentric ellipses that have their axes along the 
x- and y-coordinate axes and that have the sum of their semi-axes equal to a 
constant L is the same curve (an astroid) as that found in the worked example 
in section 5.10. 

5.21 Find the area of the region covered by points on the lines 


where the sum of any line's intercepts on the coordinate axes is fixed and equal 
to c. 

5.22 Prove that the envelope of the circles whose diameters are those chords of a 
given circle that pass through a fixed point on its circumference, is the cardioid 

r = a( 1 + cos 9). 

Here a is the radius of the given circle and ( r , 9) are the polar coordinates of the 
envelope. Take as the system parameter the angle tfi between a chord and the 
polar axis from which 9 is measured. 

5.23 A water feature contains a spray head at water level at the centre of a round 
basin. The head is in the form of a hemisphere with many evenly distributed 
small holes in it, and through which water spurts out at the same speed y 0 in all 
directions. 

(a) What is the shape of the ‘water bell' so formed? 

(b) What must be the minimum diameter of the bowl if no water is to be lost? 

5.24 In order to make a focussing mirror that concentrates parallel axial rays to one 
spot (or conversely forms a parallel beam from a point source) a parabolic shape 
should be adopted. If a mirror that is part of a circular cylinder or sphere were 
used, the light would be spread out along a curve. This curve is known as a 
caustic and is the envelope of the rays reflected from the mirror. Denoting by 9 
the angle which a typical incident axial ray makes with the normal to the mirror 
at the place where it is reflected, the geometry of reflection (the angle of incidence 
equals the angle of reflection) is shown in figure 5.5. 

Show that a parametric specification of the caustic is 

x = R cos 9 (i + sin 2 0) , y = R sin 3 9, 

where R is the radius of curvature of the mirror. The curve is, in fact, part of an 
epicycloid. 

5.25 By considering the differential 

dG = d(U + PV - ST), 

where G is the Gibbs free energy, P the pressure, V the volume, S the entropy 
and T the temperature of a system, and given further that 

dU = TdS-PdV , 

derive a Maxwell relation connecting (d V/dT) P and (dS/dP) T - 
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.v 



Figure 5.5 The reflecting mirror discussed in exercise 5.24. 


5.26 Functions P(V, T), U(V, T) and S(V, T) are related by 

TdS = dU + PdV, 


where the symbols have the same meaning as in the previous question. P is 
known from experiment to have the form 



in appropriate units. If 

U = aVT*+PT, 

where a, /?, are constants (or at least do not depend on T, V ), deduce that a. 
must have a specific value but /i may have any value. Find the corresponding 
form of S. 

5.27 As in the previous two exercises on the thermodynamics of a simple gas, the 
quantity dS = T~ l (dU + PdV ) is an exact differential. Use this to prove that 



In the van der Waals model of a gas, P obeys the equation 

RT a 

P = 

V-b V 2 

where R, a and b are constants. Further, in the limit V — » oo, the form of U 
becomes U = cT, where c is another constant. Find the complete expression for 
U(V, T). 

5.28 The entropy S(H, T), the magnetisation M(H, T) and the internal energy U(H , T) 
of a magnetic salt placed in a magnetic field of strength H at temperature T are 
connected by the equation 


TdS = dU-HdM. 
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5.29 


5.31 


5.32 


By considering d(U — TS — HM ), or otherwise, prove that 


For a particular salt 


(SM\ _(8S' 
\~dTj H ~\dH yT 


M(H, T) = M 0 [l - exp( -ocH/T)]. 


Show that, at a fixed temperature, if the applied field is increased from zero to 
a strength such that the magnetization of the salt is | M 0 then the salt’s entropy 
decreases by an amount 

Using the results of section 5.12, evaluate the integral 

e~ xy sin x 


Ky) = 


■ dx. 


Jo 


Hence show that 


5.30 The integral 


/ sin x n 

J= dx = — . 

Jo x 2 


/ OO 

e~ axl dx 

-OO 

has the value (n/a) 111 . Use this result to evaluate 

/ OO 

x 2n e~ x ~ dx, 

-OO 

where n is a positive integer. Express your answer in terms of factorials. 

The function f(x) is differentiable and / (0) = 0. A second function g(y) is defined 
by 

f v f(x)dx 


g(y)= f 

Jo 


-Jy-x' 


Prove that 


dg f y df dx 


dy Jo dx J y - x' 
For the case f(x) = x n , prove that 

d n a 

The functions f(x,t) and F(x) are defined by 

f(x,t) = e- x \ 


F(x) = / f(x,t)dt. 

Jo 


Verify by explicit calculation that 


dF ^ [ x 8f(x,t) , 

— =f(x,x)+ — dt. 

dx Jq ux 
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5.33 


5.34 


5.35 


5.1 


5.2 

5.3 

5.4 

5.5 

5.6 


5.7 

5.8 

5.9 

5.10 


If 


1(a) = 


lnx 


-dx. 


a > —1, 


what is the value of 7(0)? Show that 


and deduce that 


— x“ = x“lnx, 
da 


d .. . 1 

-r J («) = — t- 

da a + 1 


Hence prove that 1(a) = ln(l + a). 

Find the derivative with respect to x of the integral 

p7>x 

I(x)= / expxtdt. 


The function G(f, i) is defined for 0 < t < n by 


G(t,£) 


— cos f sin c 

— sin f cos £ 


Show that the function x(t) defined by 


for<^ < f, 
fore > t. 


satisfies the equation 


X (t)= r G (t,mm 

Jo 


d 2 x 
dt 2 


=m 


for any arbitrary (continuous) function /(f). Show further that x(0) = 
[dx/dt\ x=n = 0, again for any /(f), but that the value of x(n) does depend 
upon the form of /(f). 

(The function G(f, £) is an example of a Green’s function, an important 
concept in the solution of differential equations and one studied extensively in 
later chapters.) 


5.14 Hints and answers 

(a) (i) 2 xv, x 2 ; (ii) 2x,2y; (iii) y _1 cos(x/y), (— x/y 2 )cos(x/y); 

(iv) -y/(x 2 + y 2 ),x/(x 2 + y 2 ); (v) x/r, y/r,z/r. 

(b) (i) 2y,0,2x; (ii) 2,2,0; (v) (y 2 + z 2 )r- 3 ,(x 2 + z 2 )r~ 3 , -xyr~ 3 . 

(c) Both second derivatives are equal to (y 2 — x 2 )(x 2 + y 2 ) -2 . 

Only (c) and (e). 

2x =f= —2 y — x. For g both sides of equation (5.9) equal y~ 2 . 

(a) 1 + x — x 2 2x + 1. (b) g' = — g. tf>(x, y) = x(x + 1 )ye~ x + k. 

d 2 z/dx 2 = 2xz(z 2 + x) -3 , d 2 z/dxdy = (z 2 — x)(z 2 + x) -3 , 8 2 z/dy 2 = — 2z(z 2 +x)~ 3 . 

The equation is most easily differentiated in the form In p + In V — In R — In T = 

-a /(VRT). p(a - VRT)/(V 2 RT)\ V(a + VRT)/[T(VRT - a)] ; VRT 2 /[p(a + 

VRT)\. 

(0,0), (a/4, —a) and (16a, —8a). Only the saddle point at (0,0). 

The transformed equation is d 2 \ p/dtds = 0 where ty(s, f) = </(x,y). 

The transformed equation is 2(x 2 + y 2 )8f /8v = 0; hence / does not depend on v. 
Write 8/8u and 8/86 in terms of x, y, 8/8x and 8/8y using (5.17). The terms 
that cancel when 8 2 (f>/8u 2 and 8 2 (j>/86 2 are added together are +[x(8f/8x) + 
y(Sf/8y) + 2 xy(8 2 f / 8x8 y)]. 
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5.11 Maxima equal to 1/8 at +(1/2,— 1/2), minima equal to —1/8 at +(1/2, 1/2), 
saddle points equalling 0 at (0,0), (0, +1), (+1,0). 

5.12 From df/dy = 0, y = 6 x/b. Substitute this into 8f /8x = 0 to obtain a quadratic 
equation for x. 

5.13 Maxima equal to a 2 e~ l at (+a,0), minima equal to — 2n 2 e~ 1 at (0,+a), saddle 
point equalling 0 at (0,0). 

5.14 Maximum equal to 16 at (—2,0), minimum equal to —16 at (2,0), saddle points 
equalling —11 at (l,+3). 

5.15 Minimum at (0,0); saddle points at (+1,+1). 

5.16 !±£at ±(J.+,J). 

5.17 Lagrange multiplier method gives z = y = x/2 for maximal area of 4. 

5.18 Put the ends of the ladder at (a + £,0) and (0 ,b + t/) and require (a, b) to be on 

the ladder. L = (a 113 + b 2/3 ) 3/2 . 

5.19 The cost always includes 2 ah which can be ignored in the optimisation. With 
Lagrange multiplier 2, sind = 2w/(4/l) and /(seed — i^wtand = Xh, leading to 
the stated results. 

5.20 If the semi-axis in the x-direction is a, then x 2 /y 1 = a 3 /(L — a) 3 for the envelope. 

5.21 The envelope of lines x/a + y/(c — a) — 1 = 0, as u varies, is ^Jx + ^Jy = , Jc . Area 

= c 2 / 6. 

5.22 The equation of a typical circle is r = 2a cos <j> cos (d — (/>). The envelope condition 
gives <j> = Q/2. 

5.23 (a) Using a = cot d, where d is the initial angle a jet makes with the vertical, the 

equation is f(z,p,a) = z— pa+[gp 2 (l+a 2 )/(2rQ)], and setting df /8a = 0 gives 
a = vl/(gp). The water bell has a parabolic profile z = y<j/(2g) — gp 2 /(2vl). 
(b) Setting z = 0 gives the minimum diameter as 2v\/g. 

5.24 The reflected ray has equation y = tan2d(x — R sin d/ sin 2d). Put this into 
the standard form f(x, y, d) = 0 and eliminate y or x from this equation and 
df/d0 = 0. 

5.25 Show that ( 8G/8P) T = V and ( 8G/8T) P = —S. From each result obtain an 
expression for 8 2 G/8T8P and equate these, giving ( 8V/8T) P = — ( 8S/8P)t ■ 

5.26 Establish that (8U/8V) T = T(8S /dV) T — P and that ( 8U/8T) V = T(8S/8T) V . 
Equate expressions for 8 2 S/8T8V and hence show a = 1. Integrate (8S/8V) T 
and (8S/8T) V to show that S = 4T 3 V/3 + In V + /? In T + c. 

5.27 Find expressions for (8S/8V)t and (8S/8T) V , and equate 8 2 S/8V8T with 
8 2 S/8T8V. U(V, T) = cT — aV-\ 

5.28 Show that dF = d(U —TS — HM ) = — S dT — M dH and find two expressions for 
8 2 F/8H8T. Establish that (8S/8FI)t = —M 0 aHT^ 2 exp(~aF[/T) and integrate 
with respect to H. 

5.29 dl/dy = — Imf/g 00 exp(— xy + ix)dx] = —1/(1 + y 2 )- Integrate dl/dy from 0 to oo. 
I(o o) = 0 and 1(0) = J. 

5.30 Differentiate both the integral and its value n times with respect to a and then 
set a= 1. Note that 1 3 5 • • • (2n - 1) = (2n)!/(2 n n\). J(n) = (2n)\ y /n/(4 n n\). 

5.31 Integrate the RHS of the equation by parts before differentiating with respect 
to y. Repeated application of the method establishes the result for all orders of 
derivative. 

5.32 Both sides of the equation equal e~ x ~ + x~ 2 (e~ x — 1). 

5.33 1(0) = 0; use Leibniz' rule. 

5.34 (6 - x~ 2 )exp(3x 2 ) - (2 - x“ 2 )expx 2 . 

5.35 Write x(t) = —cost fg sing f(£)d£ — sint f* cos g f (g) dg and differentiate each 
term as a product to obtain dx/dt. Obtain d 2 x/dt 2 in a similar way. Note 
that integrals that have equal lower and upper limits have value zero. x(7t) = 
fo sing f(g)dg. 
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6 


Multiple integrals 


For functions of several variables, just as we may consider derivatives with respect 
to two or more of them, so may the integral of the function with respect to more 
than one variable be formed. The formal definitions of such multiple integrals are 
extensions of that for a single variable, discussed in chapter 2. We first discuss 
double and triple integrals and illustrate some of their applications. We then 
consider changing the variables in multiple integrals and discuss some general 
properties of Jacobians. 


6.1 Double integrals 

For an integral involving two variables - a double integral - we have a function, 
f{x,y) say, to be integrated with respect to x and y between certain limits. These 
limits can usually be represented by a closed curve C bounding a region R in the 
xy-plane. Following the discussion of single integrals given in chapter 2, let us 
divide the region R into N subregions A R p of area A A p , p = 1, 2, . . . , N, and let 
{x p ,y p ) be any point in subregion A R p . Now consider the sum 

1 v 

S — 'y ^ f(x p ,y p )AA p , 
p = i 

and let N — ► oo as each of the areas A A p — » 0. If the sum S tends to a unique 
limit, /, then this is called the double integral of f(x,y ) over the region R and is 
written 

I=f f(x,y)dA, (6.1) 

Jr 

where dA stands for the element of area in the xy-plane. By choosing the 
subregions to be small rectangles each of area A^4 = AxAy, and letting both Ax 
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Figure 6.1 A simple curve C in the xy-plane, enclosing a region R. 
and Ay — * 0, we can also write the integral as 

1= JJ f(x,y)dxdy, (6.2) 

where we have written out the element of area explicitly as the product of the 
two coordinate differentials (see figure 6.1). 

Some authors use a single integration symbol whatever the dimension of the 
integral; others use as many symbols as the dimension. In different circumstances 
both have their advantages. We will adopt the convention used in (6.1) and (6.2), 
that as many integration symbols will be used as differentials explicitly written. 

The form (6.2) gives us a clue as to how we may proceed in the evaluation 
of a double integral. Referring to figure 6.1, the limits on the integration may 
be written as an equation c(x,y) = 0 giving the boundary curve C. However, an 
explicit statement of the limits can be written in two distinct ways. 

One way of evaluating the integral is first to sum up the contributions from 
the small rectangular elemental areas into horizontal strips of width dy (as shown 
in the figure) and then to combine these horizontal strips to cover the region R. 
In this case, we write 

r y =d ( rx=x 2 (y) 'i 

I = \ I f(x,y)dx > dy, (6.3) 

J y=c l Jx=xi(y ) J 

where x = xi(y) and x = xj (y) are the equations of the curves TSV and TUV 
respectively. This expression indicates that first f(x,y) is to be integrated with 
respect to x (treating y as a constant) between the values x = xi(y) and x = X 2 (y) 
and then the result, considered as a function of y, is to be integrated between the 
limits y = c and y = d. Thus the double integral is evaluated by expressing it in 
terms of two single integrals called iterated (or repeated ) integrals. 

An alternative way of evaluating the integral, however, is first to sum up the 
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contributions from the elemental rectangles arranged into vertical strips and then 
to combine these vertical strips to cover the region R. We then write 

rx=b c ry=yz(x) s 

1=1 \ I f(x,y)dy > dx, (6.4) 

J x=a lJy= yi (x) ) 

where y = yi(x) and y = yiiX) are the equations of the curves STU and SVU 
respectively. In going to (6.4) from (6.3), we have essentially interchanged the 
order of integration. 

In the discussion above we assumed that the curve C was such that any 
line parallel to either the x- or y-axis intersected C at most twice. In general, 
provided f(x,y) is continuous everywhere in R and the boundary curve C has this 
simple shape, the same result is obtained irrespective of the order of integration. 
In cases where the region R has a more complicated shape, it can usually be 
subdivided into smaller simpler regions R], Rj etc. that satisfy this criterion. The 
double integral over R is then merely the sum of the double integrals over the 
subregions. 


► Evaluate the double integral 


-a. 


x 2 y dx dy. 


where R is the triangular area bounded by the lines x = 0, y = 0 and x + y = 1. Reverse 
the order of integration and demonstrate that the same result is obtained. 


The area of integration is shown in figure 6.2. Suppose we choose to carry out the 
integration with respect to y first. With x fixed, the range of y is 0 to 1 — x. We can 
therefore write 


[ { r x 2 ydy\dx 

Jx = 0 {Jy = 0 J 

f 


y=o 

x=l r v 2„n y=1 ~ x 


dx = 


nl x 2 (l -x) 2 


dx 


1 

60' 


Alternatively, we may choose to perform the integration with respect to x first. With y 
fixed, the range of x is 0 to 1 —y, so we have 


-nr 
-r 

J y=0 


x=\—y 


x 2 y dx > dy 


> =1 rxV^ 


dx = 


3 - 60 


As expected, we obtain the same result irrespective of the order of integration. ◄ 

We may avoid the use of braces in expressions such as (6.3) and (6.4) by writing 
(6.4), for example, as 

rb ryz(x) 

1=1 dx dyf(x,y), 

Ja J yi(x) 

where it is understood that each integral symbol acts on everything to its right, 
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Figure 6.2 The triangular region whose sides are the axes x = 0, y = 0 and 
the line x + y = 1. 


and that the order of integration is from right to left. So, in this example, the 
integrand f(x,y) is first to be integrated with respect to y and then with respect 
to x. With the double integral expressed in this way, we will no longer write the 
independent variables explicitly in the limits of integration, since the differential 
of the variable with respect to which we are integrating is always adjacent to the 
relevant integral sign. 

Using the order of integration in (6.3), we could also write the double integral as 

rd /■x 2 (y) 


/ = 


*i O') 


dxf(x,y). 


Occasionally, however, the interchange of the order of integration in a double 
integral is not permissible, as it yields a different result. For example, difficulties 
might arise if the region R were unbounded with some of the limits are infi- 
nite, though, in many cases involving infinite limits the same result is obtained 
whichever order of integration is used. Difficulties can also occur if the integrand 
f(x,y ) has any discontinuities in the region R or on its boundary C. 


6.2 Triple integrals 

The above discussion for double integrals can easily be extended to triple integrals. 
Consider the function f(x,y,z ) defined in a closed three-dimensional region R. 
Proceeding as we did for double integrals, let us divide the region R into N 
subregions A R p of volume AV P , p = 1,2 and let ( x p ,y p ,z p ) be any point in 

the subregion A R p . Now we form the sum 

N 

S = ^ f {Xp, y P , z P )AV p , 
p = i 


193 



MULTIPLE INTEGRALS 


and let N — > oo as each of the volumes AV P — » 0. If the sum S tends to a unique 

limit, /, then this is called the triple integral of f(x,y,z ) over the region R and is 

written 

I = [ f(x,y,z)dV, (6.5) 

Jr 

where dV stands for the element of volume. By choosing the subregions to be 
small cuboids, each of volume AV = AxAyAz, and proceeding to the limit, we 
can also write the integral as 

I = JJJ f(x,y,z)dxdy dz, (6.6) 

where we have written out the element of volume explicitly as the product of the 
three coordinate differentials. Extending the discussion of double integrals, we 
may write triple integrals as three iterated integrals, for example, 

rx 2 pyi(> c) r~2(x,y) 

1=1 dx I dy dz f(x,y,z), 

Jx i J yi(x) J z\(x,y) 

where the limits on each of the integrals describe the values that x, y and z take 
on the boundary of the region R. As for double integrals, in most cases the order 
of integration does not affect the value of the integral. 

We can extend these ideas to define multiple integrals of higher dimensionality 
in a similar way. 


6.3 Applications of multiple integrals 

Multiple integrals have many uses in the physical sciences, since there are numer- 
ous physical quantities which can be written in terms of them. We now discuss a 
few of the more common examples. 


6.3.1 Areas and volumes 


Multiple integrals are often used in finding areas and volumes. For example, the 
integral 

A — / dA = II dx dy 
Jr J Jr 

is simply equal to the area of the region R. Similarly, if we consider the surface 
z = f(x,y ) in three-dimensional Cartesian coordinates then the volume under this 
surface that stands vertically above the region R is given by the integral 


V = / z dA — / / f(x,y)dxdy, 

Jr J Jr 

where volumes above the xy-plane are counted as positive, and those below as 
negative. 
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Figure 6.3 The tetrahedron bounded by the coordinate surfaces and the 
plane x/a + y/b + z/c = 1 is divided up into vertical slabs, the slabs into 
columns and the columns into small boxes. 


>-Find the volume of the tetrahedron bounded by the three coordinate surfaces x = 0, y = 0 
and z = 0 and the plane x/a + y/b + z/c = 1. 


Referring to figure 6.3, the elemental volume of the shaded region is given by dV = z dx dy, 
and we must integrate over the triangular region R in the xy-plane whose sides are x = 0, 
y = 0 and y = b — bx/a. The total volume of the tetrahedron is therefore given by 


rr r a rb-bx/a v x , 

/ / z dxdy = / dx / rfy c ( 1 — ) 

JJr Jo Jo \ b aJ 


V = 


dx 


y — — — — 

2b a 
bx 2 bx b 


y=b—bx/ a 


dx (-- — + -)= — . <4 
2a- a 


abc 


Alternatively, we can write the volume of a three-dimensional region R as 

V = J dV = JJJ dxdydz, (6.7) 

where the only difficulty occurs in setting the correct limits on each of the 
integrals. For the above example, writing the volume in this way corresponds to 
dividing the tetrahedron into elemental boxes of volume dxdydz (as shown in 
figure 6.3); integration over 2 then adds up the boxes to form the shaded column 
in the figure. The limits of integration are 2 = 0tO2 = c(l — y/b — x/a), and 
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the total volume of the tetrahedron is given by 

r-a fb—bx/a rctl—y/b—x/a ) 

V= dx dy / dz, (6.8) 

Jo Jo Jo 

which clearly gives the same result as above. This method is illustrated further in 
the following example. 


►fine/ the volume of the region bounded by the paraboloid z = x 2 + y 2 and the plane 
z = 2 y. 


The required region is shown in figure 6.4. In order to write the volume of the region in 
the form (6.7), we must deduce the limits on each of the integrals. Since the integrations 
can be performed in any order, let us first divide the region into vertical slabs of thickness 
dy perpendicular to the y-axis, and then as shown in the figure we cut each slab into 
horizontal strips of height dz, and each strip into elemental boxes of volume dV = dxdy dz. 
Integrating first with respect to x (adding up the elemental boxes to get a horizontal strip), 
the limits on x are x = —\J z — y 2 to x = ' z — y 2 . Now integrating with respect to z 

(adding up the strips to form a vertical slab) the limits on z are z = y 2 to z = 2 y. Finally, 
integrating with respect to y (adding up the slabs to obtain the required region), the limits 
on y are y = 0 and y = 2, the solutions of the simultaneous equations z = 0 2 + y 2 and 
z = 2y. So the volume of the region is 

/■2 r2y r-^Jz-y 2 r-2 xly 

V = I dy I dz dx = dy dz 2f z — y 2 

Jo J y 2 J -Jz-y 2 Jo J y 2 

= jT 2 dy [«(z - y 2 ) 3/2 ] = £ dy I (2y - y 2 ?' 2 . 

The integral over y may be evaluated straightforwardly by making the substitution y = 
1 + sinw, and gives V = n/2. ◄ 

In general, when calculating the volume (area) of a region, the volume (area) 
elements need not be small boxes as in the previous example, but may be of any 
convenient shape. They are usually chosen to make the evaluation of the integral 
as simple as possible. 


6.3.2 Masses, centres of mass and centroids 


It is sometimes necessary to calculate the mass of a given object having a non- 
uniform density. Symbolically, this mass is given simply by 



where dM is the element of mass and the integral is taken over the extent of the 
object. For a solid three-dimensional body the element of mass is just dM = p dV, 
where dV is an element of volume and p is the variable density. For a laminar 
body (i.e. a uniform sheet of material) the element of mass is dM = a dA, where 
a is the mass per unit area of the body and dA is an area element. Finally, for 
a body in the form of a thin wire we have dM = Ids, where 1 is the mass per 
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Figure 6.4 The region bounded by the paraboloid z = x 2 + y 2 and the plane 
z = 2y is divided into vertical slabs, the slabs into horizontal strips and the 
strips into boxes. 


unit length and ds is an element of arc length along the wire. When evaluating 
the required integral, we are free to divide up the body into mass elements in 
the most convenient way, provided that over each mass element the density is 
approximately constant. 


► Find the mass of the tetrahedron bounded by the three coordinate surfaces and the plane 
x/a + y/b + z/c = 1, if its density is given by p(x,y,z) = p o(l + x/a ). 


From (6.8), we can immediately write down the mass of the tetrahedron as 


M = 





1— y/b— x/a) 


dz. 


where we have taken the density outside the integrations with respect to z and y since it 
depends only on x. Therefore the integrations with respect to z and y proceed exactly as 
they did when finding the volume of the tetrahedron, and we have 


M = cp 0 




/ bx 2 bx b\ 
\ 2 a 2 a + 2 ) 


(6.9) 


We could have arrived at (6.9) more directly by dividing the tetrahedron into triangular 
slabs of thickness dx perpendicular to the x-axis (see figure 6.3), each of which is of 
constant density, since p depends on x alone. A slab at a position x has volume dV = 
ic(l — x/a)(b — bx/a)dx and mass dM = pdV = po(l + x/a)dV. Integrating over x we 
again obtain (6.9). This integral is easily evaluated and gives M = ^ abcp 0 . ◄ 
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The coordinates of the centre of mass of a solid or laminar body may also be 
written as multiple integrals. The centre of mass of a body has coordinates x, y, 
z) given by the three equations 


x / dM = / xdM 


y J dM = J y dM 
z [ dM = [ z dM, 


where again dM is an element of mass as described above, x, y, z are the 
coordinates of the centre of mass of the element dM and the integrals are taken 
over the entire body. Obviously, for any body that lies entirely in, or is symmetrical 
about, the xy-plane (say), we immediately have z = 0. For completeness, we note 
that the three equations above can be written as the single vector equation (see 
chapter 7) 



where r is the position vector of the body’s centre of mass with respect to the 
origin, r is the position vector of the centre of mass of the element dM and 
M = J dM is the total mass of the body. As previously, we may divide the body 
into the most convenient mass elements for evaluating the necessary integrals, 
provided each mass element is of constant density. 

We further note that the coordinates of the centroid of a body are defined as 
those that its centre of mass would have if the body had uniform density. 


►Find the centre of mass of the solid hemisphere bounded by the surfaces x 2 +y 2 +z 2 = a 2 
and the xy-plane, assuming that it has a uniform density p. 

Referring to figure 6.5, we know from symmetry that the centre of mass must lie on 
the z-axis. Let us divide the hemisphere into volume elements that are circular slabs of 
thickness dz parallel to the xy-plane. For a slab at a height z, the mass of the element is 
dM = pdV = pn(a 2 —z 2 ) dz. Integrating over z, we find that the z-coordinate of the centre 
of mass of the hemisphere is given by 

pa pa 

z / pn(a 2 — z 2 ) dz = / zpn(a 2 — z 2 )dz. 

Jo Jo 

The integrals are easily evaluated and give z = 3a/8. Since the hemisphere is of uniform 
density, this is also the position of its centroid. ◄ 


6.3.3 Pappus’ theorems 

The theorems of Pappus (which are about seventeen centuries old) relate centroids 
to volumes of revolution and areas of surfaces, discussed in chapter 2, and can be 
useful for finding one quantity given another that may be calculated more easily. 
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Figure 6.5 The solid hemisphere bounded by the surfaces x 2 + y 1 + z 2 = a 2 
and the xy-plane. 



Figure 6.6 An area A in the xy-plane, which may be rotated about the x-axis 
to form a volume of revolution. 


If a plane area is rotated about an axis that does not intersect it then the solid 
so generated is called a volume of revolution. Pappus’ first theorem states that the 
volume of such a solid is given by the plane area A multiplied by the distance 
moved by its centroid (see figure 6.6). This may be proved by considering the 
definition of the centroid of the plane area as the position of the centre of mass 
if the density is uniform, so that 

y = jJydA- 

Now the volume generated by rotating the plane area about the x-axis is given by 

V = J 2ny dA = 2nyA, 

which is the area multiplied by the distance moved by the centroid. 
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Figure 6.7 A curve in the xy-plane, which may be rotated about the x-axis 
to form a surface of revolution. 


Pappus’ second theorem states that if a plane curve is rotated about a coplanar 
axis that does not intersect it then the area of the surface of revolution so generated 
is given by the length of the curve L multiplied by the distance moved by its 
centroid (see figure 6.7). This may be proved in a similar manner to the first 
theorem by considering the definition of the centroid of a plane curve, 


y = 


1 

L 


/ 


y ds. 


and noting that the surface area generated is given by 


S = 


2ny ds = 2nyL, 


which is equal to the length of the curve multiplied by the distance moved by its 
centroid. 


► A semicircular uniform lamina is freely suspended from one of its corners. Show that its 
straight edge makes an angle of 23.0° with the vertical. 


Referring to figure 6.8, the suspended lamina will have its centre of gravity C vertically 
below the suspension point and its straight edge will make an angle 9 = tan ^(d/a) with 
the vertical, where 2a is the diameter of the semicircle and d is the distance of its centre 
of mass from the diameter. 

Since rotating the lamina about the diameter generates a sphere of volume jita 1 . Pappus' 
first theorem requires that 

f ia 2 = 2n x d x \na 2 . 

Hence d = \ a/n and 8 = tan _1 (A) = 23.0°. ◄ 
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Figure 6.8 Suspending a semicircular lamina from one of its corners. 

6.3.4 Moments of inertia 

For problems in rotational mechanics it is often necessary to calculate the moment 
of inertia of a body about a given axis. This is defined by the multiple integral 

/ = J l 2 dM, 


where / is the distance of a mass element dM from the axis. We may again choose 
mass elements convenient for evaluating the integral. In this case, however, in 
addition to elements of constant density we require all parts of each element to 
be at approximately the same distance from the axis about which the moment of 
inertia is required. 


► Find the moment of inertia of a uniform rectangular lamina of mass M with sides a and 
b about one of the sides of length b. 


Referring to figure 6.9, we wish to calculate the moment of inertia about the y-axis. 
We therefore divide the rectangular lamina into elemental strips parallel to the y-axis of 
width dx. The mass of such a strip is dM = ab dx, where a is the mass per unit area of 
the lamina. The moment of inertia of a strip at a distance x from the y-axis is simply 
dl = x 2 dM = abx 2 dx. The total moment of inertia of the lamina about the y-axis is 
therefore 


/ = 


(7 fox 2 dx = 


aba 3 


Since the total mass of the lamina is M = aab , we can write I = jMa 2 . ◄ 
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Figure 6.9 A uniform rectangular lamina of mass M with sides a and b can 
be divided into vertical strips. 


6.3.5 Mean values of functions 

In chapter 2 we discussed average values for functions of a single variable. This 
is easily extended to functions of several variables. Let us consider, for example, 
a function f(x,y ) defined in some region R of the xy-plane. Then the average 
value / of the function is given by 

f f dA= f f(x,y)dA. (6.10) 

Jr Jr 

This definition is easily extended to three (and higher) dimensions; if a function 
f{x,y,z ) is defined in some three-dimensional region of space R then the average 
value / of the function is given by 

7 f dV= f f{x,y,z)dV. (6.11) 

Jr Jr 


►A tetrahedron is bounded by the three coordinate surfaces and the plane x/a+y /b+z/c = 

1 and has density p(x,y,z) = p o(l + x/a). Find the average value of the density. 

From (6.11), the average value of the density is given by 

p I dV = I p(x,y,z)dV. 

Jr Jr 

Now the integral on the LHS is just the volume of the tetrahedron, which we found in 
subsection 6.3.1 to be V = \abc , and the integral on the RHS is its mass M = abcpo , 
calculated in subsection 6.3.2. Therefore p = M/V = |po- < 


6.4 Change of variables in multiple integrals 

It often happens that, either because of the form of the integrand involved or 
because of the boundary shape of the region of integration, it is desirable to 
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Figure 6.10 A region of integration R overlaid with a grid formed by the 
family of curves u = constant and v = constant. The parallelogram KLMN 
defines the area element dA uv . 


express a multiple integral in terms of a new set of variables. We now consider 
how to do this. 


6.4.1 Change of variables in double integrals 


Let us begin by examining the change of variables in a double integral. Suppose 
that we require to change an integral 


I = 


f (x, y) dx dy. 


J Jr 

in terms of coordinates x and y, into one expressed in new coordinates u and v, 
given in terms of x and y by differentiable equations u = u(x,y) and v = v(x,y) 
with inverses x = x(u,v) and y = y(u,v). The region R in the xy-plane and the 
curve C that bounds it will become a new region R' and a new boundary C in 
the uu-plane, and so we must change the limits of integration accordingly. Also, 
the function f(x,y) becomes a new function g(u,v ) of the new coordinates. 

Now the part of the integral that requires most consideration is the area element. 
In the xv-plane the element is the rectangular area dA xy = dxdy generated by 
constructing a grid of straight lines parallel to the x- and y- axes respectively. 
Our task is to determine the corresponding area element in the un-coordinates. In 
general the corresponding element dA uv will not be the same shape as dA xy , but 
this does not matter since all elements are infinitesimally small and the value of 
the integrand is considered constant over them. Since the sides of the area element 
are infinitesimal, dA lw will in general have the shape of a parallelogram. We can 
find the connection between dA xy and dA uv by considering the grid formed by the 
family of curves u = constant and v — constant, as shown in figure 6.10. Since v 
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is constant along the line element KL, the latter has components ( 8x/8u)du and 
(dy/8u)du in the directions of the x- and y-axes respectively. Similarly, since u 
is constant along the line element KN, the latter has corresponding components 
(8x/dv)dv and ( dy/dv)dv . Using the result for the area of a parallelogram given 
in chapter 7, we find that the area of the parallelogram KLMN is given by 


dA 


dx , 3y , dx . 8y , 
— du— dv — — dv — du 
du dv dv du 


dx dy dx dy 
du dv dv du 


du dv. 


Defining the Jacobian of x, y with respect to u, v as 

d(x,y)_dxdy dx dy 
d(u,v) du dv dv du’ 


we have 


dA u 


8{x,y) 
d(u,v ) 


du dv. 


The reader acquainted with determinants will notice that the Jacobian can also 
be written as the 2x2 determinant 


8{x,y) 
d(u,v ) 


dx dy 
du du 
dx dy 
dv dv 


Such determinants can in general be evaluated using the methods of chapter 8. 

So, in summary, the relationship between the size of the area element generated 
by dx, dy and the size of the corresponding area element generated by du, dv is 


dx dy = 


d(x,y) 

d(u, v) 


du dv. 


This equality should be taken as meaning that when transforming from coordi- 
nates x,y to coordinates u,v, the area element dxdy should be replaced by the 
expression on the RHS of the above equality. Of course, the Jacobian can, and 
in general will, vary over the region of integration. We may express the double 
integral in either coordinate system as 


/ = 


JJ f(x,y)dxdy 



d(x,y) 
d(u,v ) 


du dv. 


( 6 . 12 ) 


When evaluating the integral in the new coordinate system, it is usually advisable 
to sketch the region of integration R ’ in the ire-plane. 
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► Evaluate the double integral 

I = J j (a + \J x 2 + y 2 ^ dx dy, 
where R is the region bounded by the circle x 2 + y 1 = a 2 . 


In Cartesian coordinates, the integral may be written 


/ = 



dy (a + yjx 2 + y 2 j , 


and can be calculated directly. However, because of the circular boundary of the integration 
region, a change of variables to plane polar coordinates p, </> is indicated. The relationship 
between Cartesian and plane polar coordinates is given by x = pcoscj) and y = p sin0. 
Using (6.12) we can therefore write 


/ = 



dj-Vy) 

8(P,4>) 


dp dtp, 


where R' is the rectangular region in the p4 >- plane whose sides are p = 0, p = a, 0 = 0 
and 4> = 2n. The Jacobian is easily calculated, and we obtain 


S(x,y) 

S(p,4>) 


cos cj) sin tj) 
—p sin p cos 4> 


= p( cos 2 <j) + sin 2 (f>) = p. 


So the relationship between the area elements in Cartesian and in plane polar coordinates is 


dx dy = p dp d(f>. 


Therefore, when expressed in plane polar coordinates, the integral is given by 


/ = 


JJ ( a + p)pdpdtj> 


l R' 
p2n 


d(j) / dp (a + p)p = 2n 


ap 2 P 31 

2 + 3 


5na 3 


J o 


6.4.2 Evaluation of the integral I = J Zo e * 2 dx 

By making a judicious change of variables, it is sometimes possible to evaluate 
an integral that would be intractable otherwise. An important example of this 
method is provided by the evaluation of the integral 



Its value may be found by first constructing I 2 , as follows: 
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Figure 6.11 The regions used to illustrate the convergence properties of the 
integral I (a) = f“ e~ x ~ dx as a — > oo. 


where the region R is the whole xy-plane. Then, transforming to plane polar 
coordinates, we find 


I 2 



pdpdcj) = 



= 2n 



= 71 . 


Therefore the original integral is given by I = ^Jn. Because the integrand is an 
even function of x, it follows that the value of the integral from 0 to oo is simply 

yfi/ 2 - 

We note, however, that unlike in all the previous examples, the regions of 
integration R and R' are both infinite in extent (i.e. unbounded). It is therefore 
prudent to derive this result more rigorously; this we do by considering the 
integral 



dx. 


We then have 


I 2 (a) 



e ( x2+y2) dxdy. 


where R is the square of side 2 a centred on the origin. Referring to figure 6.11, 
since the integrand is always positive the value of the integral taken over the 
square lies between the value of the integral taken over the region bounded by 
the inner circle of radius a and the value of the integral taken over the outer 
circle of radius ^fla. Transforming to plane polar coordinates as above, we may 
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Figure 6.12 A three-dimensional region of integration R, showing an el- 
ement of volume in u,v,w coordinates formed by the coordinate surfaces 
u = constant, v = constant, w = constant. 


evaluate the integrals over the inner and outer circles respectively, and we find 
n ^1 — e “ < I 2 ( a ) < n ^1 — . 

Taking the limit a — > oo, we find I 2 (a) — > n. Therefore / = ^Jn as we found 
previously. We use this result in the discussion of the normal distribution in 
chapter 26. 


6.4.3 Change of variables in triple integrals 

A change of variable in a triple integral follows the same general lines as that for 
a double integral. Suppose we wish to change variables from x, y, z to u, v, w. 
In the x, y, z coordinates the element of volume is a cuboid of sides dx, dy, dz 
and volume dV xyz = dxdydz. If, however, we divide up the total volume into 
infinitesimal elements by constructing a grid formed from the coordinate surfaces 
u = constant, v = constant and w = constant, then the element of volume dV uvw 
in the new coordinates will have the shape of a parallelepiped whose faces are the 
coordinate surfaces and whose edges are the curves formed by the intersections 
of these surfaces (see figure 6.12). Along the line element PQ the coordinates v 
and w are constant, and so PQ has components of (dx/8u)du, ( 8y/du)du and 
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(8z/8u)du in the direction of the x-, y- and z- axes respectively. The components 
of the line elements PS and ST are found by replacing u by v and w respectively. 

The expression for the volume of a parallelepiped in terms of the components 
of its edges with respect to the x-, y- and z-axes is given in chapter 7. Using this, 
we find that the element of volume in u, v, w coordinates is given by 

8{x,y,z) 

dV uvw = — dudvdw, 

o(u,v,w) 

where the Jacobian of x, y, z with respect to u, v, w is a short-hand for a 3 x 3 
determinant : 

8x 8y 8z 

8u 8u 8u 

8(x,y,z ) _ 8x 8y 8z 

8(u,v,w ) 8v 8v 8v 

8x 8y 8z 

8w 8w 8w 

So, in summary, the relationship between the elemental volumes in multiple 
integrals formulated in the two coordinate systems is given in Jacobian form by 

dx dv dz = ^ — - du dv dw, 
d(u,v,w) 

and we can write a triple integral in either set of coordinates as 

I = [I [ f(x,y,z)dxdy dz = /// g(u,v,w) ^ 7 — - - — - dudvdw. 

JJJr ' JJJr' 8(u,v,w) 

► Find an expression for a volume element in spherical polar coordinates, and hence calcu- 
late the moment of inertia about a diameter of a uniform sphere of radius a and mass M. 

Spherical polar coordinates r, 9, f are defined by 

x = r sin 6 cos <j>, y = r sin 8 sin <j>, z = r cos d 

(and are discussed fully in chapter 10). The required Jacobian is therefore 

sin 9 cos f sin 9 sin <fi cos 9 
r cos 9 cos (j> r cos 9 sin f —r sin 6 
—r sin 9 sin f r sin 9 cos 0 

The determinant is most easily evaluated by expanding it with respect to the last column 
(see chapter 8), which gives 

J = cos 9{r 2 sin 9 cos 9) + r sin 9(r sin 2 9) 

= r 2 sin 9(c os 2 9 + sin 2 9) = r 2 sin 9. 

Therefore the volume element in spherical polar coordinates is given by 
dV = 1 j dr d9 d<j> = r 2 sin 9 dr d9 d<f>, 

which agrees with the result given in chapter 10. 


d(x, y, z) 
d(r,9,4>) 
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If we place the sphere with its centre at the origin of an x, y, z coordinate system then 
its moment of inertia about the z-axis (which is, of course, a diameter of the sphere) is 

I = j (x 2 + y 2 ) dM = p J (x 2 + y 2 ) dV, 

where the integral is taken over the sphere, and p is the density. Using spherical polar 
coordinates, we can write this as 


/ = 


'///, ( r 2 sin 2 9 ) r 2 sin 9 dr d9 d<j> 


= P 


dtp 


d9 sin 3 9 dr r 4 
Jo 


= p x 2n x i x la 5 = za 5 p. 


Since the mass of the sphere is M = jna 2 p, the moment of inertia can also be written as 
/ = § Ma 2 . ◄ 


6.4.4 General properties of Jacobians 


Although we will not prove it, the general result for a change of coordinates in 
an n-dimensional integral from a set x,- to a set }\j (where i and j both run from 
1 to n) is 


dx i dx 2 • • • dx n 


d(xi,x 2 ,...,x n ) 

d{y\,y2,---,y n ) 


dyi dy 2 ■ ■ ■ dy n . 


where the n-dimensional Jacobian can be written as an n x n determinant (see 
chapter 8) in an analogous way to the two- and three-dimensional cases. 

For readers who already have sufficient familiarity with matrices (see chapter 8) 
and their properties, a fairly compact proof of some useful general properties 
of Jacobians can be given as follows. Other readers should turn straight to the 
results (6.16) and (6.17) and return to the proof at some later time. 

Consider three sets of variables x,-, y t and z, : , with i running from 1 to n for 
each set. From the chain rule in partial differentiation (see (5.17)), we know that 


dxi 

dzj 


n 


E 


dxj 8y k 
dy k dzj ' 


(6.13) 


Now let A, B and C be the matrices whose ij th elements are dxj/dyj, dyj/dzj and 
dxi/dzj respectively. We can then write (6.13) as the matrix product 

n 

Cij=^a ik b kj or C = AB. (6.14) 

k = 1 


We may now use the general result for the determinant of the product of two 
matrices, namely |AB| = |A||B|, and recall that the Jacobian 


d(xi,...,x„) = 
d{yu...,y n ) 


(6.15) 
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and similarly for J yz and J xz . On taking the determinant of (6.14), we therefore 
obtain 


J xz JxyJyz 


or, in the usual notation, 

d(x\,...,x n ) 


d(xi,...,x n ) d(yi,...,y„) 


8(zi,...,z n ) 8(y l ,...,y„) d(z u ...,z n )' 


(6.16) 


As a special case, if the set z; is taken to be identical to the set x,-, and the 
obvious result J xx = 1 is used, we obtain 


JxyJyx 1 


or, in the usual notation, 

d(x u ...,x n ) 


d(yu---,y n ) 


8(yu---,y„) 


(6.17) 


The similarity between the properties of Jacobians and those of derivatives is 
apparent, and to some extent is suggested by the notation. We further note from 
(6.15) that since |A| = |A T |, where A T is the transpose of A, we can interchange the 
rows and columns in the determinantal form of the Jacobian without changing 
its value. 


6.1 

6.2 

6.3 

6.4 


6.5 


6.5 Exercises 


Sketch the curved wedge bounded by the surfaces y 2 = 4 ax, x + z = a and z = 0, 
and hence calculate its volume V. 

Evaluate the volume integral of x 2 + y 2 + z 2 over the rectangular parallelepiped 
bounded by the six surfaces x = +a, y = +b, z = +c. 

Find the volume integral of x 2 y over the tetrahedral volume bounded by the 
planes x = 0, y = 0, z = 0, and x + y + z = 1. 

Evaluate the surface integral of f(x,y) over the rectangle 0<x<a, 0<y<b 
for the functions 

(a) f(x,y) = * (b) f(x,y) = (b-y + x)~ 3/2 . 

x ~f~ y 

(a) Prove that the area of the ellipse 


is nab. 

(b) Use this result to obtain an expression for the volume of a slice of thickness 
dz of the ellipsoid 



Hence show that the volume of the ellipsoid is 4nabc/3. 
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6.6 


6.7 


The function 


’P(r) = A 


— ) e -Zr/2a 

a ) 


gives the form of the quantum mechanical wavefunction representing the electron 
in a hydrogen-like atom of atomic number Z when the electron is in its first 
allowed spherically symmetric excited state. Here r is the usual spherical polar 
coordinate, but, because of the spherical symmetry, the coordinates 0 and </> do 
not appear explicitly in 'P. Determine the value that A (assumed real) must have 
if the wavefunction is to be correctly normalised, i.e. the volume integral of |*P| 2 
over all space is equal to unity. 

In quantum mechanics the electron in a hydrogen atom in some particular state 
is described by a wavefunction *P, which is such that |'P| 2 dV is the probability of 
finding the electron in the infinitesimal volume dV. In spherical polar coordinates 
4 / = 'F(r, 6, ((>) and dV = r 2 sin 0 dr dd dtj>. Two such states are described by 


% 




3/2 

2e~ r,a °, 


6.8 


6.9 


6.10 


T'j 



1/2 

sine e** 


1 \ 3/2 re~ r/2a ° 

2ao J fl 0 \/3 


(a) Show that each V P; is normalised, i.e. the integral over all space J |'F| 2 dF is 
equal to unity - physically, this means that the electron must be somewhere. 

(b) The (so-called) dipole matrix element between the states 1 and 2 is given by 
the integral 

p x = [ 'Vlqr sin 6 cos (j) TS dV, 


where q is the charge on the electron. Prove that p x has the value —2 7 qao/3 5 . 


A planar figure is formed from uniform wire and consists of two semicircular 
arcs, each with its own closing diameter, joined so as to form a letter ‘B'. The 
figure is freely suspended from its top left-hand corner. Show that the straight 
edge of the figure makes an angle 0 with the vertical given by tan0 = (2 + n)^ 1 . 
A certain torus has a circular vertical cross-section of radius a centred on a 
horizontal circle of radius c (> a). 


(a) Find the volume V and surface area A of the torus, and show that they can 
be written as 

V = ^-(r 2 - r 2 )(r 0 - n), A = 7 t 2 (r 2 - rf), 

where r a and r a are respectively the outer and inner radii of the torus. 

(b) Show that a vertical circular cylinder of radius c, coaxial with the torus, 
divides A in the ratio 

tic + 2a : nc — 2a. 


A thin uniform circular disc has mass M and radius a. 


(a) Prove that its moment of inertia about an axis perpendicular to its plane 
and passing through its centre is \Ma 2 . 

(b) Prove that the moment of inertia of the same disc about a diameter is jMa 2 . 

This is an example of the general result for planar bodies that the moment of 
inertia of the body about an axis perpendicular to the plane is equal to the sum 
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6.11 


6.12 


6.13 


6.14 


6.15 


6.16 


6.17 


of the moments of inertia about two perpendicular axes lying in the plane : in an 
obvious notation 


I 7 = I r 2 dm = J (x 2 + y 2 ) dm = j x 2 dm + J y 2 dm = I y + I x . 


In some applications in mechanics the moment of inertia of a body about a 
single point (as opposed to about an axis) is needed. The moment of inertia I 
about the origin of a uniform solid body of density p is given by the volume 
integral 

1= [ (x 2 + y 2 + z 2 )pdV. 

Jv 

Show that the moment of inertia of a right circular cylinder of radius a, length 
2b, and mass M about its centre is 



The shape of an axially symmetric hard-boiled egg, of uniform density po, is 
given in spherical polar coordinates by r = a( 2 — cos 9), where 9 is measured 
from the axis of symmetry. 

(a) Prove that the mass M of the egg is M = j-npoa 3 . 

(b) Prove that the egg’s moment of inertia about its axis of symmetry is j^Ma 2 . 

In spherical polar coordinates r, 9, <f> the element of volume for a body that 
is symmetrical about the polar axis is dV = 2nr 2 sin 9 dr d9, whilst its element 
of surface area is 2nr sin 6[(dr) 2 + r 2 (d9) 2 ] 1/2 . A particular surface is defined by 
r = 2a cos 9, where a is a constant, and 0 < 9 < n/2. Find its total surface area 
and the volume it encloses, and hence identify the surface. 

By expressing both the integrand and the surface element in spherical polar 
coordinates, show that the surface integral 



over the surface x 2 + y 2 = z 2 , 0 < z < 1, has the value n/ ^J2. 

By transforming to cylindrical polar coordinates, evaluate the integral 




+ y 2 ) dx dy dz 


over the interior of the conical region x 2 + y 2 < z 2 , 0 < z < 1. 
Sketch the two families of curves 


y 2 = 4 u(u — x), 


y 2 = 4v(v + x), 


where u and v are parameters. 

By transforming to the uv- plane evaluate the integral of y/(x 2 +y 2 ) 1/2 over 
that part of the quadrant x > 0, y > 0 bounded by the lines x = 0, y = 0 and 
the curve y 2 = 4 a(a — x). 

By making two successive simple changes of variables, evaluate 


/ = 


/// 


x 2 dx dy dz 


over the ellipsoidal region 
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6.18 


6.19 


6.20 


6.21 


6.22 


6.23 


Sketch the domain of integration for the integral 


I = 


r'/y y 3 


— exp [y z (x 2 +x 1 )]dxdy 


and characterise its boundaries in terms of new variables u = xy and v = y/x. 
Show that the Jacobian for the change from ( x,y ) to (u,v) is equal to (2v)~ l , and 
hence evaluate /. 

Sketch that part of the region 0 < x, 0 < y < 7t/2 which is bounded by the 
curves x = 0, y = 0, sinh x cos y = 1 and cosh x sin y = 1. By making a suitable 
change of variables, evaluate the integral 


/ = 


// ,s,nh! x + cos 2 y ) sinh 2x sin 2 y dx dy 


over the bounded sub-region. 

Define a coordinate system u,v whose origin coincides with that of the usual 
x,y system and whose u-axis coincides with the x-axis, whilst the y-axis makes 
an angle a. with it. By considering the integral / = f exp (— r 2 ) dA, where r is the 
radial distance from the origin, over the area defined by 0 < u < oo, 0 < v < oo, 
prove that 



exp (—u 2 — v 2 — 2uv cos oc) du dv 


a 

2 sin a ’ 


As stated in section 5.11, the first law of thermodynamics can be expressed as 


dU = TdS-PdV. 


By calculating and equating 8 2 U/8YdX and 8 2 U/8X8Y , where X and Y are an 
unspecified pair of variables (drawn from P, V, T and S), prove that 

d(S,T) _ 8(V,P) 

8(X,Y) ~ 8(X,Y )' 

Using the properties of Jacobians, deduce that 

d(S,T) = 

8(V,P) 


The distances of the variable point P, which has coordinates x,y,z, from the fixed 
points (0,0,1) and (0,0,— 1) are denoted by u and v respectively. New variables 
are defined by 

£ = 3 (u + v), ri = j(u-v). 


and <p is the angle between the plane y = 0 and the plane containing the three 
points. Prove that the Jacobian d(8,,r\,4>)/d(x,y,z ) has the value (£ 2 — j? 2 ) -1 and 
that 


III 


( u — v) 2 


all space ^ 


- exp 


U + V 


dxdydz = 


32ti 

17 ' 


This is a more difficult question about ‘volumes' in an increasing number of 
dimensions. 


(a) Let R be a real positive number and define K m by 


K m = (R 2 — : 

J-R 

Show, using integration by parts, that K m satisfies the recurrence relation 
(2m + 1 )K m = 2mR 2 K m _] . 
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(b) For integer n, define /„ = K„ and J„ = K n+ 1 / 2 . Evaluate / o and Jo directly 
and hence prove that 

_ 2 2 " +1 (w!) 2 R 2 " +1 _ 7i(2n + 1)!R 2,,+2 

( 2/i + 1 ) ! 3n " 2 2n+l n\(n + 1)! ’ 

(c) A sequence of functions V„(R) is defined by 


V 0 (R) = 1, 

E„(R) = J V„^i (Vr 2 — x 2 ^j dx, n > 1. 

Prove by induction that 


V 2n (R) 


n''R 2 " 

nl 


V 2 „+i(R) 


n"2 2n+1 nlR 2n+1 
(2/7 + 1)! 


(d) For interest, 


(i) show that V 2 „+ 2 (l) < V 2 AI) and Un+i ( 1 ) < V 2n -i(l) for all n > 3; 

(ii) hence, by explicitly writing out V k (R) for 1 < k < 8 (say), show that the 
‘volume’ of the totally symmetric solid of unit radius is a maximum in 
five dimensions. 


6.6 Hints and answers 

6.1 For integration in the order z,y,x the limits are (0 ,a — x), (— ^fiax, ~j4ax), (0, a). 
For integration in the order y,x,z the limits are (— ^/4 ax, y/4ax), (0, a — z), (0, a). 
V = 16o 3 /15. 

6.2 &abc(a 2 + b 2 + c 2 )/3. 

6.3 1/360. 

6.4 (a) Integrate by parts to obtain (i>/2)ln[l + (a/b) 2 ] + atan^ib/a); 

(b) 41a 1 / 2 + b x ' 2 - (a + b) 1 ' 2 ]. 

6.5 (a) Evaluate J 2b[l — (x/a) 2 Y /2 dx by setting x = a cos </>; 

(b) dV = n x a[l — (z/c) 2 ] 1/2 x b[ 1 — (z/c) 2 ] 1/2 dz. 

6.6 A = ±(Z/a) il2 / y j22n. 

6.8 If one of the semicircles has radius a, Pappus’ second theorem shows that its 
centre of gravity (x) is 2ci /n from the centre of the circle of which it is half. For 
the whole figure, (x) = 4a/ (2% + 4). 

6.9 (a) V = 2nc x na 1 and A = 2na x 2nc. Setting r a = c + a and r, = c — a gives the 
stated results, (b) See hint for previous exercise. 

6.10 (b) Evaluate f 2(a 2 — x 2 ) 1/2 x 2 (M/na 2 ) dx by setting x = acos <fi. 

6.11 Transform to cylindrical polar coordinates. 

6.12 (a) Show that dz = 2asin0(cos0 — 1 )d6. Writing cos 8 as c to save space, the 

integrand is 27rpo« 3 (l — c 2 )(l — c)(2 — c) 2 dc over the range — 1 < c < 1. 

(b) The integrand is npoa 5 (l — c 2 ) 2 ( 1 — c)(2 — c) 4 dc. 

6.13 47ia 2 , 47ia 3 /3, a sphere. 

6.14 The coordinate ranges are 0 < r < ^J2 and 0 < <j> < 2n, with 9 = 7t/4. The 
integrand for the r and 4> integrations is (rcos 2 cf>)/ J2. 

6.15 The volume element is pd<f>dpdz. The integrand for the final z -integration is 
given by 27i[(z 2 lnz) — (z 2 / 2)] ; I = —5n/9. 

6.16 Jacobian = (u/y) 1/2 + (v/u) l/1 ; area in uv- plane is the triangle bounded by v = 0, 
u = v, u = a; integral = a 2 . 

6.17 Set £ = x/a, /; = y /b,/ = z/c to map the ellipsoid onto the unit sphere, and then 
change from (<i ;,)?,() coordinates to spherical polar coordinates; / = 4na 2 bc/ 15. 
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6.18 The boundaries of the three-sided region are u = v = 0,v = 1 and u = 1. 

I=(e- l) 2 /8. 

6.19 Set u = sinh x cos y, v = coshxsin y; J xyiUV = (sinh 2 x+cos 2 y) _1 and the integrand 
reduces to 4uv over the region 0<u<l,0<v<l;I = 1. 

6.20 x = v cos a + u, y = v sin a. Jacobian = sin a. 

/ = (a/2n) f exp (~r 2 )dA over all space. 

6.21 Terms such as Td 2 S/8YdX cancel in pairs. Use equations (6.17) and (6.16). 

6.22 Note that uv = (£ 2 — rj 2 ). The ranges for the new variables are 1 < { < oo, 
— 1<>)<1,0<^>< 2n. 

6.23 (d)(ii) 2, n, 4%1'b, n 2 /2, 87i 2 /15, 7t 3 /6, 16rc 3 /105, rc 4 /24. 
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7 


Vector algebra 


This chapter introduces space vectors and their manipulation. Firstly we deal 
with the description and algebra of vectors and then we consider how vectors 
may be used to describe lines and planes and finally we look at the practical use 
of vectors in finding distances. Much use of vectors will be made in subsequent 
chapters; this chapter gives only some basic rules. 


7.1 Scalars and vectors 

The simplest kind of physical quantity is one that can be completely specified by 
its magnitude, a single number, together with the units in which it is measured. 
Such a quantity is called a scalar and examples include temperature, time and 
density. 

A vector is a quantity that requires both a magnitude (> 0) and a direction in 
space to specify it completely; we may think of it as an arrow in space. A familiar 
example is force, which has a magnitude (strength) measured in newtons and a 
direction of application. The large number of vectors that are used to describe 
the physical world include velocity, displacement, momentum and electric field. 
Vectors are also used to describe quantities such as angular momentum and 
surface elements (a surface element has an area and a direction defined by the 
normal to its tangent plane); in such cases their definitions may seem somewhat 
arbitrary (though in fact they are standard) and not as physically intuitive as for 
vectors such as force. A vector is denoted by bold type, the convention of this 
book, or by underlining, the latter being much used in handwritten work. 

This chapter considers basic vector algebra and illustrates just how powerful 
vector analysis can be. All the techniques are presented for three-dimensional 
space but most can be readily extended to more dimensions. 

Throughout the book we will represent vectors in diagrams as a line together 
with an arrowhead. We will make no distinction between an arrowhead at the 
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Figure 7.1 Addition of two vectors showing the commutation relation. We 
make no distinction between an arrowhead at the end of the line and one 
along the line's length, but rather use that which gives the clearer diagram. 


end of the line or one along the line’s length but, rather, use that which gives the 
clearer diagram. Furthermore, even though we are considering three-dimensional 
vectors, we have to draw them in the plane of the paper. It should not be assumed 
that vectors drawn thus are coplanar, unless this is explicitly stated. 


7.2 Addition and subtraction of vectors 

The resultant or vector sum of two displacement vectors is the displacement vector 
that results from performing first one and then the other displacement, as shown 
in figure 7.1; this process is known as vector addition. However, the principle 
of addition has physical meaning for vector quantities other than displacements; 
for example, if two forces act on the same body then the resultant force acting 
on the body is the vector sum of the two. The addition of vectors only makes 
physical sense if they are of a like kind, for example if they are both forces 
acting in three dimensions. It may be seen from figure 7.1 that vector addition is 
commutative, i.e. 


a + b = b + a. (7.1) 

The generalisation of this procedure to the addition of three (or more) vectors is 
clear and leads to the associativity property of addition (see figure 7.2), e.g. 

a + (b + c) = (a + b) + c. (7.2) 


Thus, it is immaterial in what order any number of vectors are added. 

The subtraction of two vectors is very similar to their addition (see figure 7.3), 
that is, 

a — b = a + (— b) 


where — b is a vector of equal magnitude but exactly opposite direction to vector b. 
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Figure 7.3 Subtraction of two vectors. 


The subtraction of two equal vectors yields the zero vector, 0, which has zero 
magnitude and no associated direction. 


7.3 Multiplication by a scalar 

Multiplication of a vector by a scalar (not to be confused with the ‘scalar 
product’, to be discussed in subsection 7.6.1) gives a vector in the same direction 
as the original but of a proportional magnitude. This can be seen in figure 7.4. 
The scalar may be positive, negative or zero. It can also be complex in some 
applications. Clearly, when the scalar is negative we obtain a vector pointing 
in the opposite direction to the original vector. Multiplication by a scalar is 
associative, commutative and distributive over addition. These properties may be 
summarised for arbitrary vectors a and b and arbitrary scalars 1 and /( by 

(Af.i)& = l(/ra) = /<(la), 
l(a + b) = la + lb, 

(1 + ji) a = la + /(a. 
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7.3 MULTIPLICATION BY A SCALAR 




Figure 7.5 An illustration of the ratio theorem. The point P divides the line 
segment AB in the ratio X : p. 


Having defined the operations of addition, subtraction and multiplication by a 
scalar, we can now use vectors to solve simple problems in geometry. 


►A point P divides a line segment AB in the ratio X : p ( see figure 7.5 ). If the position 
vectors of the points A and B are a and b respectively, find the position vector of the 
point P. 


As is conventional for vector geometry problems, we denote the vector from the point A 
to the point B by AB. If the position vectors of the points A and B, relative to some origin 
0, are a and b, it should be clear that AB = b — a. 

Now, from figure 7.5 we see that one possible way of reaching the point P from 0 is 
first to go from O to A and to go along the line AB for a distance equal to the the fraction 
X/(X + p) of its total length. We may express this in terms of vectors as 

OP = p = 


which expresses the position vector of the point P in terms of those of A and B. We would, 
of course, obtain the same result by considering the path from 0 to B and then to P . ◄ 


a + - 


X + p 


AB 


a + (b — a) 
A + fi 


a 4- • 


A + /i J A + // 

Li A _ 

aH — b, 


A + [i A + /i 


(7.6) 
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C 



Figure 7.6 The centroid of a triangle. The triangle is defined by the points A, 
B and C that have position vectors a, b and c. The broken lines CD, BE, AF 
connect the vertices of the triangle to the mid-points of the opposite sides; 
these lines intersect at the centroid G of the triangle. 


The result (7.6) is a version of the ratio theorem and we may use it in solving 
more complicated problems. 


► The vertices of triangle ABC have position vectors a. b and c relative to some origin O 
( see figure 7.6 ). Find the position vector of the centroid G of the triangle. 


From figure 7.6, the points D and E bisect the lines AB and AC respectively. Thus from 
the ratio theorem (7.6), with X = p = 1/2, the position vectors of D and E relative to the 
origin are 

d = 5a + 5b, 
e = ,a + ±c. 

Using the ratio theorem again, we may write the position vector of a general point on the 
line CD that divides the line in the ratio X : (1 — X) as 

r = ( 1 — l)c + Ad, 

= ( 1 — l)c + f 7(a + b), (7.7) 

where we have expressed d in terms of a and b. Similarly, the position vector of a general 
point on the line BE can be expressed as 

r = (1 — p)b + pe, 

= (1 — /r)b+ f/x(a + c). (7.8) 

Thus, at the intersection of the lines CD and BE we require, from (7.7), (7.8), 

(1 - l)c + \ X(a + b) = (1 — /()b + f/((a + c). 

By equating the coefficents of the vectors a, b, c we find 

X = p, jX = 1 — p, 1 — X = 
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These equations are consistent and have the solution X = p = 2/3. Substituting these 
values into either (7.7) or (7.8) we find that the position vector of the centroid G is given 
by 

g = i(a + b + c). ◄ 


7.4 Basis vectors and components 

Given any three different vectors ej, e 2 and e 3 , which do not all lie in a plane, 
it is possible, in three-dimensional space, to write any other vector in terms of 
scalar multiples of them: 


a = die i + n 2 e 2 + « 3 e 3 . (7.9) 

The three vectors ei, e 2 and e 3 are said to form a basis (for the three-dimensional 
space); the scalars ai, a 2 and a 3 , which may be positive, negative or zero, are 
called the components of the vector a with respect to this basis. We say that the 
vector has been resolved into components. 

Most often we shall use basis vectors that are mutually perpendicular, for ease 
of manipulation, though this is not necessary. In general, a basis set must 

(i) have as many basis vectors as the number of dimensions (in more formal 
language, the basis vectors must span the space) and 

(ii) be such that no basis vector may be described as a sum of the others, or, 
more formally, the basis vectors must be linearly independent. Putting this 
mathematically, in N dimensions, we require 

ciei + c 2 e 2 -I b c N e N ± 0, 

for any set of coefficients ci, c 2 , . . . , cm except c\ = c 2 = • • • = cjv = 0. 

In this chapter we will only consider vectors in three dimensions; higher dimen- 
sionality can be achieved by simple extension. 

If we wish to label points in space using a Cartesian coordinate system (x,y,z), 
we may introduce the unit vectors i, j and k, which point along the positive x-, 
y- and z- axes respectively. A vector a may then be written as a sum of three 
vectors, each parallel to a different coordinate axis: 

a = a*i + a y \ + a z k. (7.10) 

A vector in three-dimensional space thus requires three components to describe 
fully both its direction and its magnitude. A displacement in space may be 
thought of as the sum of displacements along the x-, y- and z- directions (see 
figure 7.7). For brevity, the components of a vector a with respect to a particular 
coordinate system are sometimes written in the form (a x ,a y ,a z ). Note that the 
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Figure 7.7 A Cartesian basis set. The vector a is the sum of a x i, a y j and a z k. 


basis vectors i, j and k may themselves be represented by (1,0,0), (0,1,0) and 
(0,0, 1) respectively. 

We can consider the addition and subtraction of vectors in terms of their 
components. The sum of two vectors a and b is found by simply adding their 
components, i.e. 

a + b = a x i + a v j + a - k + b x i + b y j + b : k 

= (a x + b x ) i + (fl v + b y Y] + (a z + b z )k (7.11) 

and their difference by subtracting them, 

a — b = a x i + a y j + a, k — (h x i + b y j + fe,k) 

= (a x — b x ) i + <a y — b y ) j + (a z — h z )k. (7.12) 


► 7wo particles have velocities v t = i + 3j + 6k and \2 = i — 2k respectively. Find the 
velocity u of the second particle relative to the first. 


The required relative velocity is given by 

u = v 2 - vi = (1 - l)i + (0 - 3)j + (-2 - 6)k 
= — 3j - 8k. ◄ 


7.5 Magnitude of a vector 

The magnitude of the vector a is denoted by |a| or a. In terms of its components 
in three-dimensional Cartesian coordinates, the magnitude of a is given by 

a = |a| = y / ' a\ + a 2 y + a\. (7.13) 

Hence, the magnitude of a vector is a measure of its length. Such an analogy is 
useful for displacement vectors but magnitude is better described, for example, by 
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‘strength’ for vectors such as force or by ‘speed’ for velocity vectors. For instance, 
in the previous example, the speed of the second particle relative to the first is 
given by 

u = | u | = \J (— 3) 2 + (- 8) 2 = V 73 - 


A vector whose magnitude equals unity is called a unit vector. The unit vector 
in the direction a is usually notated a and may be evaluated as 

. a 


a = 


(7.14) 


The unit vector is a useful concept because a vector written as za then has mag- 
nitude X and direction a. Thus magnitude and direction are explicitly separated. 


7.6 Multiplication of vectors 

We have already considered multiplying a vector by a scalar. Now we consider 
the concept of multiplying one vector by another vector. It is not immediately 
obvious what the product of two vectors represents and in fact two products 
are commonly defined, the scalar product and the vector product. As their names 
imply, the scalar product of two vectors is just a number, whereas the vector 
product is itself a vector. Although neither the scalar nor the vector product 
is what we might normally think of as a product, their use is widespread and 
numerous examples will be described elsewhere in this book. 


7.6.1 Scalar product 

The scalar product (or dot product) of two vectors a and b is denoted by a ■ b 
and is given by 

a ■ b = |a||b| cos 8, 0 < 9 < n, (7.15) 

where 9 is the angle between the two vectors, placed ‘tail to tail’ or ‘head to head’. 
Thus, the value of the scalar product a ■ b equals the magnitude of a multiplied 
by the projection of b onto a (see figure 7.8). 

From (7.15) we see that the scalar product has the particularly useful property 
that 


a • b = 0 (7.16) 

is a necessary and sufficient condition for a to be perpendicular to b (unless either 
of them is zero). It should be noted in particular that the Cartesian basis vectors 
i, j and k, being mutually orthogonal unit vectors, satisfy the equations 

i i = j j = k k = 1, 

i j=j k = k i = 0. 
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Figure 7.8 The projection of b onto the direction of a is be os 9. The scalar 
product of a and b is ab cos 9. 

Examples of scalar products arise naturally throughout physics and in partic- 
ular in connection with energy. Perhaps the simplest is the work done F • r in 
moving the point of application of a constant force F through a displacement r; 
notice that, as expected, if the displacement is perpendicular to the direction of 
the force then F • r = 0 and no work is done. A second simple example is afforded 
by the potential energy — m • B of a magnetic dipole, represented in strength and 
orientation by a vector m, placed in an external magnetic field B. 

As the name implies, the scalar product has a magnitude but no direction. The 
scalar product is commutative and distributive over addition: 

a b = b a (7.19) 

a • (b + c) = a b + a • c. (7.20) 


►Four points A,B,C,D are positioned such that the line AD is perpendicular to BC and 
BD is perpendicular to AC. Show that CD is perpendicular to AB. 


Let us denote the position vectors of the points A, B, C, D by a, b, c, d respectively. As 
the four points are not coplanar it is difficult to draw a helpful diagram of the situation, 
but this is not a drawback when vector methods are used. We start by noting that, since 
AD _L BC, we have from (7.16) that 


(d — a) • (c — b) = 0. 


Similarly, since BD _L AC , 


(d — b) • (c — a) = 0. 


Combining these two equations we find 

(d — a) ■ (c — b) = (d — b) ■ (c — a), 
which, on mutliplying out the parentheses, gives 

dc — ac db + ab = dc — be — da + ba. 
Cancelling terms that appear on both sides and rearranging yields 

d b — d a — c • b + c • a = 0, 


which simplifies to give 


(d — c) • (b — a) = 0. 


From (7.16), we see that this implies that CD is perpendicular to AB. ◄ 
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If we introduce a set of basis vectors that are mutually orthogonal, such as i, j, 
k, we can write the components of a vector a, with respect to that basis, in terms 
of the scalar product of a with each of the basis vectors, i.e. a x = a ■ i, a y = a j and 
a z = a • k. In terms of the components a x , a y and a z the scalar product is given by 

a b = (a x i + a y j + u r k) • (b x i + b y j + b z k) = a x b x + a y b y + a z b z , (7.21) 

where the cross terms such as a x i • b y j are zero because the basis vectors are 
mutually perpendicular; see equation (7.18). It should be clear from (7.15) that 
the value of a ■ b has a geometrical definition and that this value is independent 
of the actual basis vectors used. 

► Find the angle between the vectors a = i + 2j + 3k and b = 2i + 3j + 4k. 

From (7.15) the cosine of the angle 9 between a and b is given by 


From (7.21) the scalar product a ■ b has the value 

a-b=lx2 + 2x3 + 3x4 = 20, 
and from (7. 13 (the lengths of the vectors are 

|a| = sj\ 2 + 2 2 + 3 2 = Vl4 and |b| = sjl 2 + 3 2 + 4 2 = y/29. 

Thus, 

cos 9 = ~ 0.9926 => 9 = 0.12 rad. ◄ 

V14V29 


We can see from the expressions (7.15), (7.21) for the scalar product that if 6 
is the angle between a and b then 


cos 6 = 


R\ b x 
a b 


a y b y 


a : b z 
a b 


where a x /a, a y /a and a z /a are called the direction cosines of a, since they give the 
cosine of the angle made by a with each of the basis vectors. Similarly b x /b, b y /b 
and b z /b are the direction cosines of b. 

If we take the scalar product of any vector a with itself then clearly 0 = 0 and 
from (7.15) we have 

a a = |a| 2 . 


Thus the magnitude of a can be written in a coordinate-independent form as 
|a| = ^a • a. 

Finally, we note that the scalar product may be extended to vectors with 
complex components if it is redefined as 

a b = a x b x + a y b y + a z b z , 

where the asterisk represents the operation of complex conjugation. To accom- 
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Figure 7.9 The vector product. The vectors a, b and a x b form a right-handed 
set. 

modate this extension the commutation property (7.19) must be modified to 
read 

a b = (b a)*. (7.22) 

In particular it should be noted that (Aa) • b = A* a • b, whereas a • (Ab) = Aa • b. 
However, the magnitude of a complex vector is still given by |a| = ^a ■ a. since 
a • a is always real. 

7.6.2 Vector product 

The vector product (or cross product) of two vectors a and b is denoted by a x b 
and is defined to be a vector of magnitude |a||b| sind in a direction perpendicular 
to both a and b; 

a x b| = |a||b| sind. 

The direction is found by ‘rotating’ a into b through the smallest possible angle. 
The sense of rotation is that of a right-handed screw which moves forward in 
the direction a x b (see figure 7.9). Again, 0 is the angle between the two vectors 
placed ‘tail to tail’ or ‘head to head’. With this definition a, b and a x b form a 
right-handed set. A more directly usable description of the relative directions in 
a vector product is provided by a right hand whose first two fingers and thumb 
are held to be as nearly mutually perpendicular as possible. If the first finger is 
pointed in the direction of the first vector and the second finger in the direction 
of the second vector, then the thumb gives the direction of the vector product. 

The vector product is distributive over addition, but anticommutative and non- 
associative : 


(a + b) x c = (a x c) + (b x c), 
b x a = —(a x b), 

(a x b) x c ^ a x (b x c). 
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Figure 7.10 The moment of the force F about 0 is rx F. The cross represents 
the direction of r x F, which is perpendicularly into the plane of the paper. 


From its definition, we see that the vector product has the very useful property 
that if a x b = 0 then a is parallel or antiparallel to b (unless either of them is 
zero). We also note that 

a x a = 0. (7.26) 


► S/iow that if a = b + Xc, for some scalar X, then a x c = b x c. 

From (7.23) we have 

axc = (b + lc)xc = bxc + icxc. 

However, from (7.26), c x c = 0 and so 

a x c = b x c. (7.27) 

We note in passing that the fact that (7.27) is satisfied does not imply that a = b. ◄ 

An example of the use of the vector product is that of finding the area. A, of 
a parallelogram with sides a and b, using the formula 

A = |a x b|. (7.28) 

Another example is afforded by considering a force F acting through a point R, 
whose vector position relative to the origin 0 is r (see figure 7.10). Its moment 
or torque about 0 is the strength of the force times the perpendicular distance 
OP, which numerically is just Fr sind, i.e. the magnitude of r x F. Furthermore, 
the sense of the moment is clockwise about an axis through O that points 
perpendicularly into the plane of the paper (the axis is represented by a cross 
in the figure). Thus the moment is completely represented by the vector r x F, 
in both magnitude and spatial sense. It should be noted that the same vector 
product is obtained wherever the point R is chosen, so long as it lies on the line 
of action of F. 

Similarly, if a solid body is rotating about some axis that passes through the 
origin, with an angular velocity to then we can describe this rotation by a vector 
to that has magnitude to and points along the axis of rotation. The direction of to 
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is the forward direction of a right-handed screw rotating in the same sense as the 
body. The velocity of any point in the body with position vector r is then given 
by v = o) x r. 

Since the basis vectors i, j, k are mutually perpendicular unit vectors, forming 
a right-handed set, their vector products are easily seen to be 


x i = j x j = kxk = 0, 

(7.29) 

X 

II 

X 

II 

7T 

(7.30) 

j x k = -k x j = i. 

(7.31) 

k x i = — i x k = j. 

(7.32) 


Using these relations, it is straightforward to show that the vector product of two 
general vectors a and b is given in terms of their components with respect to the 
basis set i, j, k, by 


a x b = (a y b z — a z b y ) i + (a z b x — a x b : ) j + (a x b y — a y b x ) k. (7.33) 


For the reader who is familiar with determinants (see chapter 8), we record that 
this can also be written as 


a x b = 


i j k 

Cly 

b x b y b z 


That the cross product a x b is perpendicular to both a and b can be verified 
in component form by forming its dot products with each of the two vectors and 
showing that it is zero in both cases. 


►Fmrf the area A of the parallelogram with sides a = i + 2j + 3k and b = 4i + 5j + 6k. 


The vector product a x b is given in component form by 

a x b = (2 x 6 - 3 x 5)i + (3 x 4 - 1 x 6)j + (1 x 5 - 2 x 4)k 
= — 3i + 6j — 3k. 

Thus the area of the parallelogram is 

A = |a x b| = V(-3) 2 + 6 1 + (-3) 2 = y/54. ◄ 


7.6.3 Scalar triple product 

Now that we have defined the scalar and vector products, we can extend our 
discussion to define products of three vectors. Again, there are two possibilities, 
the scalar triple product and the vector triple product. 
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Figure 7.11 


The triple scalar product gives the volume of a parallelepiped. 


The scalar triple product is denoted by 

[a, b, c] = a • (b x c) 


and, as its name suggests, it is just a number. It is most simply interpreted as the 
volume of a parallelepiped whose edges are given by a, b and c (see figure 7.11). 
The vector v = a x b is perpendicular to the base of the solid and has magnitude 
v = ab sin 9, i.e. the area of the base. Further, v • c = vc cos 0. Thus, since c cos cj) 
= OP is the vertical height of the parallelepiped, it is clear that (a x b) • c = area 
of the base x perpendicular height = volume. It follows that, if the vectors a, b 
and c are coplanar, a • (b x c) = 0. 

Expressed in terms of the components of each vector with respect to the 
Cartesian basis set i, j, k the scalar triple product is 


a • (b x c) = a x (b y c z — b z c y ) + a y {b z c x — b x c z ) + a z (b x c y — b y c x ), 


(7.34) 


which can also be written as a determinant: 


a • (b x c) = 


b x b y b z 

C X Cy C Z 


By writing the vectors in component form, it can be shown that 


a • (b x c) = (a x b) • c, 


so that the dot and cross symbols can be interchanged without changing the result. 
More generally, the triple scalar product is unchanged under cyclic permutation 
of the vectors a, b, c. Other permutations simply give the negative of the original 
triple scalar product. These results can be summarised by 

[a, b, c] = [b, c, a] = [c,a,b] =-[a,c,b] =-[b,a,c] =-[c,b,a], (7.35) 
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►find the volume V of the parallelepiped with sides a = i + 2j + 3k, b = 4i + 5j + 6k and 
c = 7i + 8j + 10k. 


We have already found that a x b = — 3i + 6j — 3k, in subsection 7.6.2. Hence the volume 
of the parallelepiped is given by 

V = |a • (b x c)| = |(a x b ) - c| 

= |(— 3i + 6j — 3k) • (7i + 8j + 10k)| 

= l(-3)(7) + (6)(8) + (-3)(10)| = 3. ◄ 

Another useful formula involving both the scalar and vector products is La- 
grange’s identity (see exercise 7.9), i.e. 

(a x b) • (c x d) = (a • c)(b • d) — (a • d)(b • c). (7.36) 

7.6.4 Vector triple product 

By the vector triple product of three vectors a, b, c we mean the vector a x (b x c). 

Clearly, a x (b x c) is perpendicular to a and lies in the plane of b and c and so 

can be expressed in terms of them (see (7.37) below). We note, from (7.25), that 
the vector triple product is not associative, i.e. ax(bxc)^(axb)xc. 

Two useful formulae involving the vector triple product are 

a x (b x c) = (a • c)b — (a • b)c, (7.37) 

(a x b) x c = (a • c)b — (b ■ c)a, (7.38) 

which may be derived by writing each vector in component form (see exercise 7.8). 
It can also be shown that for any three vectors a, b, c, 

a x (b x c) + b x (c x a) + c x (a x b) = 0. 


7.7 Equations of lines, planes and spheres 

Now that we have described the basic algebra of vectors, we can apply the results 
to a variety of problems, the first of which is to find the equation of a line in 
vector form. 


7.7.1 Equation of a line 

Consider the line passing through the fixed point A with position vector a and 
having a direction b (see figure 7.12). It is clear that the position vector r of a 
general point R on the line can be written as 

r = a + Ab, (7.39) 
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Figure 7.12 The equation of a line. The vector b is in the direction AR and 
2b is the vector from A to R. 


since R can be reached by starting from 0, going along the translation vector 
a to the point A on the line and then adding some multiple 2b of the vector b. 
Different values of 2 give different points R on the line. 

Taking the components of (7.39), we see that the equation of the line can also 
be written in the form 


x — a x 


b x 


y — a y 

b y 


z — a, 

— - — - = constant. 
b z 


(7.40) 


Taking the vector product of (7.39) with b and remembering that b x b = 0, gives 
an alternative equation for the line 


(r — a) x b = 0. 


We may also find the equation of the line that passes through two fixed points 
A and C with position vectors a and c. Since AC is given by c — a, the position 
vector of a general point on the line is 

r = a + 2(c — a). 


7.7.2 Equation of a plane 

The equation of a plane through a point A with position vector a and perpendic- 
ular to a unit position vector n (see figure 7.13) is 

(r — a) • n = 0; (7.41) 

this follows since the vector joining A to a general point R with position vector r 
is r — a ; r will lie in the plane if this vector is perpendicular to the normal to the 
plane. Rewriting (7.41) as r ■ n = a ■ n, we see that the equation of the plane may 
also be expressed in the form r • n = d, or in component form as 

lx + my + nz — d, (7.42) 
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0 


Figure 7.13 The equation of the plane is (r — a) • n = 0. 

where the unit normal to the plane is n = /i + mj + nk and d = a • n is the 
perpendicular distance of the plane from the origin. 

The equation of a plane containing points a, b and c is 

r = a + A(b — a) + /<( c — a). 

This is apparent because starting from the point a in the plane, all other points 
may be reached by moving a distance along each of two (non-parallel) directions 
in the plane. Two such directions are given by b — a and c — a. It can be shown 
that the equation of this plane may also be written in the more symmetrical form 

r = aa + /ib + yc, 

where a + p + y = 1. 

► Find the direction of the line of intersection of the planes x + 3y — z = 5 and 
2x - 2y + 4z = 3. 

The two planes have normal vectors ni = i + 3j — k and n 2 = 2i — 2j + 4k. It is clear 
that these are not parallel vectors and so the planes must intersect along some line. The 
direction p of this line must be parallel to both planes and hence perpendicular to both 
normals. Therefore 

p = iq x n 2 

= [(3)(4) - ( — 2)( — 1)] i + [(— 1)(2) - ( 1 )(4)] j + [( 1)( — 2) - (3)(2)] k 
= lOi - 6j - 8k. ◄ 


7.7.3 Equation of a sphere 

Clearly, the defining property of a sphere is that all points on it are equidistant 
from a fixed point in space and that the common distance is equal to the radius 
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of the sphere. This is easily expressed in vector notation as 

|r — c| 2 = (r — c) • (r — c) = a 2 , (7.43) 

where c is the position vector of the centre of the sphere and a is its radius. 


► Find the radius p of the circle that is the intersection of the plane n r = p and the sphere 
of radius a centred on the point with position vector c. 


The equation of the sphere is 

|r-c| 2 = a 2 , (7.44) 

and that of the circle of intersection is 

|r-b| 2 = p\ (7.45) 

where r is restricted to lie in the plane and b is the position of the circle’s centre. 

As b lies on the plane whose normal is n, the vector b c must be parallel to n, i.e. 
b — e = In for some X. Further, by Pythagoras, we must have p 1 + |b — c| 2 = a 2 . Thus 
X 2 = a 2 — p 2 . 

Writing b = c + a/ a 2 — p 2 n and substituting in (7.45) gives 

r 1 — 2r ■ + a/ a 2 — + c 2 + 2(c ■ h)y/ a 2 — p 2 + a 2 — p 2 = p 1 , 

whilst, on expansion, (7.44) becomes 

r 2 — 2r ■ c + c 2 = a 2 . 

Subtracting these last two equations, using n ■ r = p and simplifying yields 

p — c ■ n = 'fa 2 — p 2 . 

On rearrangement, this gives p as fra 2 — (p — c ■ n) 2 , which places obvious geometrical 
constraints on the values a, c, n and p can take if a real intersection between the sphere 
and the plane is to occur. ◄ 


7.8 Using vectors to find distances 

This section deals with the practical application of vectors to finding distances. 
Some of these problems are extremely cumbersome in component form, but they 
all reduce to neat solutions when general vectors, with no explicit basis set, 
are used. These examples show the power of vectors in simplifying geometrical 
problems. 


7.8.1 Distance from a point to a line 

Figure 7.14 shows a line having direction b that passes through a point A whose 
position vector is a. To find the minimum distance d of the line from a point P 
whose position vector is p, we must solve the right-angled triangle shown. We see 
that d = |p — a| sin 0; so, from the definition of the vector product, it follows that 

d = |(p — a) x b|. 


233 




VECTOR ALGEBRA 


P 



Figure 7.14 The minimum distance from a point to a line. 


►Find the minimum distance from the point P with coordinates (1,2, 1) to the line r = a+2b, 
where a = i + j + k and b = 2i — j + 3k. 


Comparison with (7.39) shows that the line passes through the point (1,1,1) and has 
direction 2i j + 3k. The unit vector in this direction is 

b = — (2i^j + 3k). 

The position vector of P is p = i + 2j + k and we find 

(p - a) x b = — L □ x (2i — 3j + 3k)] 

= ' (3i — 2k). 

VI4 

Thus the minimum distance from the line to the point P is d = yj 13/14. ◄ 


7.8.2 Distance from a point to a plane 

The minimum distance d from a point P whose position vector is p to the plane 
defined by (r — a) • n = 0 may be deduced by finding any vector from P to the 
plane and then determining its component in the normal direction. This is shown 
in figure 7.15. Consider the vector a — p, which is a particular vector from P to 
the plane. Its component normal to the plane, and hence its distance from the 
plane, is given by 


d = (a — p) • n, 

where the sign of d depends on which side of the plane P is situated. 


(7.46) 
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0 


Figure 7.15 The minimum distance d from a point to a plane. 


► Find the distance from the point P with coordinates (1,2,3) to the plane that contains the 
points A, B and C having coordinates (0,1,0), (2,3,1) and (5,7,2). 


Let us denote the position vectors of the points A, B, C by a, b, e. Two vectors in the 
plane are 

b — a = 2i + 2j + k and c — a = 5i + 6 j + 2k, 
and hence a vector normal to the plane is 

n = (2i + 2j + k) x (5i + 6 j + 2k) = — 2i + j + 2k, 
and its unit normal is 

n = 7 — t = |(- 2 i + j + 2 k). 

|n| 

Denoting the position vector of P by p, the minimum distance from the plane to P is 
given by 

d = (a — p) ■ n 

= (— i — j — 3k) • 5 ( — 2i + j + 2k) 



If we take P to be the origin 0, then we find d = I, i.e. a positive quantity. It follows from 
this that the original point P with coordinates (1,2,3), for which d was negative, is on the 
opposite side of the plane from the origin. ◄ 


7.8.3 Distance from a line to a line 

Consider two lines in the directions a and b, as shown in figure 7.16. Since a x b 
is by definition perpendicular to both a and b, the unit vector normal to both 
these lines is 

a x b 

|a x b| ' 
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Figure 7.16 The minimum distance from one line to another. 

If p and q are the position vectors of any two points P and Q on different lines 
then the vector connecting them is p — q. Thus, the minimum distance d between 
the lines is this vector’s component along the unit normal, i.e. 

d = |(p — q) • n|. 


►A line is inclined at equal angles to the x-, y- and z- axes and passes through the origin. 
Another line passes through the points (1,2,4) and (0,0,1). Find the minimum distance 
between the two lines. 


The first line is given by 


ri = 2(i+j + k), 


and the second by 


r 2 = k + p(i + 2j + 3k). 
Hence a vector normal to both lines is 


n = (i + j + k) x (i + 2j + 3k) = i — 2j + k, 


and the unit normal is 


A vector between the two lines is, for example, the one connecting the points (0,0,0) 
and (0,0,1), which is simply k. Thus it follows that the minimum distance between the 
two lines is 


d= -^=|k • (i — 2 j + k)| 


1 

vr 


◄ 


7.8.4 Distance from a line to a plane 

Let us consider the line r = a + 2b. This line will intersect any plane to which it 
is not parallel. Thus, if a plane has a normal n then the minimum distance from 
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the line to the plane is zero unless 


b • n = 0, 

in which case the distance, d , will be 

d = |(a — r) • n|, 

where r is any point in the plane. 


►A line is given by r = a + Lb, where a = i + 2j + 3k and b = 4i + 5j + 6k. Find the 
coordinates of the point P at which the line intersects the plane 

x + 2y + 3z = 6. 


A vector normal to the plane is 

n = i + 2j + 3k, 

from which we find that b ■ n f 0. Thus the line does indeed intersect the plane. To find 
the point of intersection we merely substitute the x-, y- and z- values of a general point 
on the line into the equation of the plane obtaining 

1 + 4L + 2(2 + 5/1) + 3(3 + 6L) = 6 => 14 + 32L = 6. 

This gives L = — which we may substitute into the equation for the line to obtain 
x = 1 — 1(4) = 0, y = 2— ^(5) = 1 and z = 3 — i (6) = Thus the point of intersection is 

(o,yi. ◄ 


7.9 Reciprocal vectors 

The final section of this chapter introduces the concept of reciprocal vectors, 
which have particular uses in crystallography. 

The two sets of vectors a, b, c and a', b', c' are called reciprocal sets if 

a a' = b b' = c c' = 1 (7.47) 

and 

a • b = a' • c = b • a = b • c = c' • a = c' • b = 0. (7.48) 

It can be verified (see exercise 7.19) that the reciprocal vectors of a, b and c are 
given by 


a' = 

b x c 

(7.49) 

a-(bxc)’ 

b' = 

c x a 

(7.50) 

a-(bxc)’ 

o' — 

a x b 

(7.51) 

V/ — 

a • (b x c)’ 


where a • (b x c) f 0. In other words, reciprocal vectors only exist if a, b and c are 
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not coplanar. Moreover, if a, b and c are mutually orthogonal unit vectors then 
a' = a, b' = b and e' = c, so that the two systems of vectors are identical. 

► Construct the reciprocal vectors of a = 2i, b = j + k, c = i + k. 

First we evaluate the triple scalar product : 

a • (b x c) = 2i- [(j + k) x (i + k)] 

= 2i • (i + j - k) = 2. 

Now we find the reciprocal vectors : 

a' = j(j + k) x (i + k) = |(i + j — k), 

b' = |(i + k) x 2i = j, 

c' = § (2i) x (j + k) = -j + k. 

It is easily verified that these reciprocal vectors satisfy their defining properties (7.47), 

(7.48) . ◄ 

We may also use the concept of reciprocal vectors to define the components of a 
vector a with respect to basis vectors ei, e 2 , e 3 that are not mutually orthogonal. 
If the basis vectors are of unit length and mutually orthogonal, such as the 
Cartesian basis vectors i, j, k, then (see the text preceeding (7.21)) a can be 
written in the form 

a = (a • i)i + (a j)j + (a • k)k. 

If the basis is not orthonormal, however, then this is no longer true. Nevertheless, 
we may write the components of a with respect to a non-orthonormal basis 
d, e 2 , e 3 in terms of its reciprocal basis vectors e' h e' 2 , e' 3 , which are defined as in 

(7.49) (7.51). If we let 

a = fliei + a 2 e 2 + a 3 e 3, 
then the scalar product a • e 3 is given by 

a ■ e) = • e 3 + u 2 e 2 * G + u 3 e 3 • e) = 

where we have used the relations (7.48). Similarly, a 2 = a • e' 2 and a 3 = a • e' 3 ; so 
now 

a = (a • e) )e 3 + (a • e' 2 )e 2 + (a • e 3 )e 3 . (7.52) 


7.10 Exercises 

7.1 Which of the following statements about general vectors a, b and c are true? 

(a) e ■ (a x b) = (b x a) ■ e. 

(b) a x (b x c) = (a x b) x c. 

(c) a x (b x c) = (a ■ c)b — (a ■ b)c. 

(d) d = 2a + pb implies (a x b) ■ d = 0. 

(e) a x c = b x c implies c • a — c • b = c|a — b|. 

(f) (a x b) x (c x b) = b[b ■ (c x a)]. 
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7.2 A unit cell of diamond is a cube of side A with carbon atoms at each corner, at 
the centre of each face and, in addition, displaced by ^A ( i + j + k) from each of 
the previously mentioned ones, where i, j, k are unit vectors along the cube axes. 
One corner of the cube is taken as the origin of coordinates. What are the vectors 
joining the atom at ^A ( i + j + k) to its four nearest neighbours? Determine the 
angle between the carbon bonds in diamond. 

7.3 Identify the following surfaces: 

(a) |r| = k; (b) r • u = /; (c) r • u = m|r| for —1 < m < +1; 

(d) |r — (r • u)u| = n. 

Here k, l, m and n are fixed scalars and u is a fixed unit vector. 

7.4 Find the angle between the position vectors to the points (3, —4,0) and (—2, 1,0) 
and find the direction cosines of a vector perpendicular to both. 

7.5 A, B , C and D are the four corners, in order, of one face of a cube of side 2 
units. The opposite face has corners E,F , G and H, with AE,BF, CG and DEI as 
parallel edges of the cube. The centre 0 of the cube is taken as the origin and the 
x-, y- and z-axes are parallel to AD, AE and AB respectively. Find the following: 

(a) the angle between the face diagonal AF and the body diagonal AG; 

(b) the equation of the plane through B that is parallel to the plane CGE; 

(c) the perpendicular distance from the centre J of the face BCGF to the plane 
OCG ; 

(d) the volume of the tetrahedron JOCG. 

7.6 Use vector methods to prove that the lines joining the mid-points of the opposite 
edges of a tetrahedron OABC meet at a point and that this point bisects each of 
the lines. 

7.7 The edges OP, OQ and OR of a tetrahedron OPQR are vectors p, q and r 
respectively, where p = 2i + 4j, q = 2i — j + 3k and r = 4i — 2j + 5k. Show that 
OP is perpendicular to the plane containing OQR. Express the volume of the 
tetrahedron in terms of p, q and r and hence calculate the volume. 

7.8 Prove, by writing it out in component form, that 

(a x b) x c = (a ■ c)b — (b ■ c)a, 

and deduce the result, stated in (7.25), that the operation of forming the vector 
product is non-associative. 

7.9 Prove Lagrange’s identity, i.e. 

(a x b) • (c x d) = (a ■ c)(b ■ d) - (a ■ d)(b • c). 

7.10 For four arbitrary vectors a, b, c and d, evaluate 

(a x b) x (c x d) 

in two different ways and so prove that 

a[b, c, d] — b[c, d, a] + c[d, a, b] — d[a, b,c] = 0. 

Show that this reduces to the normal Cartesian representation of the vector d, 
i.e. dj + d y j + d z k if a, b and c are taken as i, j and k, the Cartesian base vectors. 

7.11 Show that the points (1,0, 1), (1, 1,0) and ( 1,— 3,4) lie on a straight line. Give the 
equation of the line in the form 

r = a + 2b. 

7.12 The plane P t contains the points A, B and C, which have position vectors 
a = — 3i + 2j, b = 7i + 2j and c = 2i + 3j + 2k respectively. Plane P 2 passes through 
A and is orthogonal to the line BC, whilst plane P 3 passes through B and is 
orthogonal to the line AC. Find the coordinates of r, the point of intersection of 
the three planes. 
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7.13 

7.14 


7.15 


7.16 


7.17 


7.18 


7.19 


Two planes have non-parallel unit normals n and m and their closest distances 
from the origin are A and fi respectively. Find the vector equation of their line of 
intersection in the form r = vp + a. 

Two fixed points, A and B , in three-dimensional space have position vectors a 
and b. Identify the plane P given by 

(a — b) ■ r = i(a 2 — b 2 ), 

where a and b are the magnitudes of a and b. 

Show also that the equation 


(a-r) • (b-r) = 0 


describes a sphere S of radius |a — b|/2. Deduce that the intersection of P and 

S is also the intersection of two spheres, centred on A and B and each of radius 

l a -b|/V 2 - 

Let O, A, B and C be four points with position vectors 0, a, b and c, and denote 

by g = Aa + fib + vc the position of the centre of the sphere on which they all lie. 

(a) Prove that X, fi and v simultaneously satisfy 

(a ■ a)X + (a ■ b)/( + (a ■ c)v = \<r 
and two other similar equations. 

(b) By making a change of origin, find the centre and radius of the sphere on 
which the points p = 3i+j — 2k, q = 4i + 3j — 3k, r = 7i — 3k and s = 6i+j — k 
all lie. 

The vectors a, b and c are coplanar and related by 

Aa + fib + vc = 0, 


where A, fi, v are not all zero. Show that the condition for the points with position 
vectors aa,/?b and yc to be collinear is 


A fi v 
— f yj H — 
« P y 


= 0 . 


(a) Show that the line of intersection of the planes x + 2y + 3z = 0 and 
3x + 2y + z = 0 is equally inclined to the x- and z- axes and makes an angle 
cos _1 ( — 2/^6) with the y-axis. 

(b) Find the perpendicular distance between one corner of a unit cube and the 
major diagonal not passing through it. 

Four points 2f, (i = 1,2,3, 4), taken for simplicity as all lying within the octant 
x,y,z > 0, have position vectors x,. Convince yourself that vector x„ lies within 
the sector of space defined by the other three vectors if 


max < min 

over i I over j=f=i 


Xj • Xj 

.Mix,-!. 


} 


= n , 


i.e. if n equals that value of i for which the largest of the set of angles which x, 
makes with the other vectors is the lowest. Determine whether any of the four 
points with coordinates 


Xi= (3,2,2), X 2 = (2,3,1), X 3 =(2,1,3), X, = (3,0,3) 

lies within the tetrahedron defined by the origin and the other three points. 

The vectors a, b and c are not coplanar. The vectors a', b' and c' are the 
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Figure 7.17 A face-centred cubic crystal. 


associated reciprocal vectors. Verify that the expressions (7.49)— (7.5 1 ) define a set 
of reciprocal vectors a', b' and c' with the following properties : 

(a) a' a = b' b = c' c = 1 ; 

(b) a' ■ b = a' ■ c = b' ■ a etc = 0; 

(c) [a',b',c'] = 1 / [a, b,c]; 

(d) a = (b' x c')/[a',b',c']. 

7.20 Three non-coplanar vectors a, b and c, have as their respective reciprocal vectors 
the set a', b' and c'. Show that the normal to the plane containing the points 
fc -1 a, / -1 b and m -1 c is in the direction of the vector lea' + lb' + me'. 

7.21 In a crystal with a face-centred cubic structure, the basic cell can be taken as a 
cube of edge a with its centre at the origin of coordinates and its edges parallel 
to the Cartesian coordinate axes; atoms are sited at the eight corners and at the 
centre of each face. However, other basic cells are possible. One is the rhomboid 
shown in figure 7.17, which has the three vectors b, c and d as edges. 

(a) Show that the volume of the rhomboid is one-quarter that of the cube. 

(b) Show that the angles between pairs of edges of the rhomboid are 60° and that 
the corresponding angles between pairs of edges of the rhomboid defined by 
the reciprocal vectors to b, c, d are each 109.5°. (This rhomboid can be used 
as the basic cell of a body-centred cubic structure, more easily visualised as 
a cube with an atom at each corner and one at its centre.) 

(c) In order to use the Bragg formula, 2d sin 9 = nl, for the scattering of X-rays 
by a crystal, it is necessary to know the perpendicular distance d between 
successive planes of atoms; for a given crystal structure, d has a particular 
value for each set of planes considered. For the face-centred cubic structure 
find the distance between successive planes with normals in the k, i + j and 
i + j + k directions. 

7.22 In subsection 7.6.2 we showed how the moment or torque of a force about an axis 
could be represented by a vector in the direction of the axis. The magnitude of 
the vector gives the size of the moment and the sign of the vector gives the sense. 
Similar representations can be used for angular velocities and angular momenta. 

(a) The magnitude of the angular momentum about the origin of a particle of 
mass m moving with velocity v on a path that is a perpendicular distance d 
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from the origin is given by m\v\d. Show that if r is the position of the particle 
then the vector J = r x mv represents the angular momentum. 

(b) Now consider a rigid collection of particles (or a solid body) rotating about 
an axis through the origin, the angular velocity of the collection being 
represented by u>. 

(i) Show that the velocity of the ;'th particle is 

v, = to x r, 

and that the total angular momentum J is 

J = ^ - (r, • co)r,]. 

i 

(ii) Show further that the component of J along the axis of rotation can 
be written as I to, where I, the moment of inertia of the collection 
about the axis or rotation, is given by 

1 = m 'P 2 i- 

i 

Interpret geometrically. 

(iii) Prove that the total kinetic energy of the particles is \lu> 2 . 

7.23 By proceeding as indicated below, prove the parallel axis theorem, which states 
that, for a body of mass M, the moment of inertia / about any axis is related to 
the corresponding moment of inertia I 0 about a parallel axis that passes through 
the centre of mass of the body by 

/ — / o — Ma 

where a± is the perpendicular distance between the two axes. Note that / o can 
be written as 

J (n x r) • (n x r) dm, 

where r is the vector position, relative to the centre of mass, of the infinitesimal 
mass dm and n is a unit vector in the direction of the axis of rotation. Write a 
similar expression for / in which r is replaced by r' = r — a, where a is the vector 
position of any point on the axis to which / refers. Use Lagrange's identity and 
the fact that J r dm = 0 (by the definition of the centre of mass) to establish the 
result. 

7.24 Without carrying out any further integration, use the results of the previous 
exercise, the worked example in subsection 6.3.4 and exercise 6.10 to prove that 
the moment of inertia of a uniform rectangular lamina, of mass M and sides a 
and b, about an axis perpendicular to its plane and passing through the point 
(aa/2,f!b/2), with — 1 < a,/J < 1, is 

— |/r(l + 3a 2 ) + b~( 1 + 3 /?")]. 

7.25 Define a set of (non-orthogonal) base vectors a=j + k, b = i + k and c = i + j. 

(a) Establish their reciprocal vectors and hence express the vectors p = 3i— 2j+k, 
q = i + 4j and r = — 2i + j + k in terms of the base vectors a, b and c. 

(b) Verify that the scalar product p ■ q has the same value, —5, when evaluated 
using either set of components. 
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Figure 7.18 An oscillatory electric circuit. The power supply has angular 
frequency co = 2nf = 40071 s _1 . 


7.26 


7.27 


Systems that can be modelled as damped harmonic oscillators are widespread; 
pendulum clocks, car shock absorbers, tuning circuits in television sets and radios, 
and collective electron motions in plasmas and metals are just a few examples. 

In all these cases, one or more variables describing the system obey(s) an 
equation of the form 

x + 2yx + (OqX = P cos cot, 

where x = dx/dt, etc. and the inclusion of the factor 2 is conventional. In the 
steady state (i.e. after the effects of any initial displacement or velocity have been 
damped out) the solution of the equation takes the form 

x(t) = A cos(co t + cj>). 

By expressing each term in the form B cos(co t + ej and representing it by a vector 
of magnitude B making an angle e with the x-axis, draw a closed vector diagram, 
at t = 0, say, that is equivalent to the equation. 

(a) Convince yourself that whatever the value of co (> 0) cf> must be negative 
(—7t < cj> < 0) and that 


cj> = tan 1 



(b) Obtain an expression for A in terms of P , coq and co. 

According to alternating current theory, the currents and voltages in the compo- 
nents of the circuit shown in figure 7.18 are determined by Kirchhoff's laws and 
the relationships 


Vi V 2 

h = -Jr-, h=~^, h = icoCV 3, 

J<i k 2 


V a, = iw LI 2 . 


The factor i = T in the expression for I 2 indicates that the phase of I 2 is 90° 
ahead of V 2 . Similarly the phase of F 4 is 90° ahead of I 2 . 

Measurement shows that V 2 has an amplitude of 0.661 Fo and a phase of 
+13.4° relative to that of the power supply. Taking V 0 = 1 V and using a series 
of vector plots for voltages and currents (they could all be on the same plot if 
suitable scales were chosen), determine all unknown currents and voltages and 
find values for the inductance of L and the resistance of R 2 . (Scales of 1 cm = 
0.1 V for voltages and 1 cm = 1 mA for currents are convenient.) 
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7.11 Hints and answers 


7.1 (c), (d) and (e). 

7.2 In units of \A the vectors are — i — j — k, i + j — k, i — j + k, — i + j + k; 
cos *(— f ) = 109.5°. 

7.3 (a) A sphere of radius k centred on the origin; (b) a plane with its normal in the 
direction of u and a distance / from the origin; (c) a cone with its axis parallel to 
u and semiangle cos -1 m; (d) a circular cylinder of radius n with its axis parallel 
to u. 

7.4 cos-‘(-2/V5) = 153.4°; 0, 0, 1. 

7.5 (a) cos- 1 a/ 273; ( b ) z-x = 2; (c) 1//2; (d) y|(c x g) • j = |. 

7.6 With an obvious notation, the mid-points of OA and BC are a/2 and (b + c)/2; 
the mid-point of the line joining them is (a + b + c)/2. The same result is obtained 
for OB and AC, and for OC and AB. 

7.7 Show that q x r is parallel to p; volume = j [3(q x r) ■ p] = |. 

7.9 Note that (a x b) ■ (c x d) = d ■ [(a x b) x c] and use the result from the previous 

question. 

7.10 Consider (a x b) x [(c x d)] as 2a + pb and [(a x b)] x (c x d) as 2'c + p'd using 
the result of exercise 7.8. 

7.11 Show that the position vectors of the points are linearly dependent; r = a + 2b 
where a = i + k and b = — j + k. 

7.12 The conditions are (r — a) ■ [(b — a) x (c — a)] =0, (r — a) - (b — c) =0 and 

(r — b) - (c — a) = 0; the point of intersection is r = 2i + 7j + 10k. 

7.13 Show that p must have the direction n x m and write a as xn+ym. By obtaining a 
pair of simultaneous equations for x and y, prove that x = (2— /in-m)/[l— (n-rn) 2 ] 
and that y = (p — 2n • m)/[l — (n ■ m) 2 ]. 

7.14 P is the plane orthogonal to the line joining A and B and equidistant from them. 
S is |r — c| 2 = (|a — b|/2) 2 , where c = (a + b)/2. Add and subtract the equations 
for P and S and arrange the resulting equations in the form |r — d| 2 = R 2 . 

7.15 (a) Note that |a — g| 2 = R 2 = |0 — g| 2 , leading to a ■ a = 2a ■ g. 

(b) Make p the new origin and solve the three simultaneous linear equations to 
obtain 2 = 5/18, p = 10/18, v = —3/18, giving g = 2i — k and a sphere of 
radius /5 centred on (5, 1,-3). 

7.16 For collinearity, y c = 0oca + (1 — 0)y8b for some 9. 

7.17 (a) Find two points on both planes, say (0,0,0) and (1,-2, 1), and hence determine 
the direction cosines of the line of intersection; (b) (|) 1/2 . 

7.18 The scalar products s,j (i = 1,2,3;/ > i) between pairs of unit vectors are 0.907, 
0.907, 0.857; 0.714, 0.567; 0.945. Thus i = 1 has the highest minimum (s 14 = 0.857) 
and so only could meet the condition. The plane containing X 2 , X 2 and V 4 
is x + y + z — 6 = 0; since 3 + 2 + 2 — 6 > 0,Xi lies outside the tetrahedron 
OX 2 X 3 X 4 . None of the points meets the condition. 

7.19 For (c) and (d), use the result of exercise 7.8 to evaluate (c x a) x (a x b). 

7.20 The normal is in the direction (/ _1 b — k _1 a) x (m -1 c — k -1 a). 

7.21 (b) b' = a _1 (— i+j + k), c' = a _1 (i— j + k), d' = a^ji+j — k); (c) a/2 for direction 

k; successive planes through (0,0,0) and (a/2,0,0) give a spacing of a// 8 for 
direction i+j; successive planes through (—a/2,0,0) and (a/2,0,0) give a spacing 
of a// 3 for direction i+j + k. 

7.22 (a) Check both magnitude and rotational sense. (b)(i) Use the result of exercise 

7.8 to evaluate r,- x m,(co x r,). (ii) Form (J • <u)/co; p, is the distance of the 
ith particle from the axis of rotation, (iii) use Lagrange’s identity to evaluate 
(a> x r,-) • (at x r, ). 

7.23 Note that a 2 — (n ■ a) 2 = a 2 . 

7.24 The moment of inertia about an axis through the centre of the rectangle and 
perpendicular to its plane is /M(a 2 + b 2 ). 
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Figure 7.19 The vector diagram for the equation in exercise 7.26. 


7.25 p = —2a + 3b, q = |a — |b + |c and r = 2a — b — c. Remember that a ■ a = b ■ b = 
c ■ c = 2 and a b = a c = b c = l. 

See figure 7.19 and recall that — cos 9 = cos(0 + 7t) and — sind = cos(d + 7i/2). 

(a) With <pi > 0, no matter what value co takes, the possible resultants (broken 
arrows) can never equal P. With <f> 2 < 0, closure of the quadrilateral is possible. 

(b) A = P [(cOq — co 2 ) 2 + 4 y 2 w 2 ]~ 1/2 . 

7.27 With currents in units of mA/| Ko|. Voltages in units of Vq : 
h = (7.76,-23.2°), I 2 = (14.36,-50.8°), h = (8.30,103.4°); 

Vi = (0.388,-23.2°), V 2 = (0.287,-50.8°), V 4 = (0.596,39.2°); 

L = 33 mH, R 2 = 20 Q. 
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Matrices and vector spaces 


In the previous chapter we defined a vector as a geometrical object which has 
both a magnitude and a direction and which may be thought of as an arrow fixed 
in our familiar three-dimensional space, a space which, if we need to, we define 
by reference to, say, the fixed stars. This geometrical definition of a vector is both 
useful and important since it is independent of any coordinate system with which 
we choose to label points in space. 

In most specific applications, however, it is necessary at some stage to choose 
a coordinate system and to break down a vector into its component vectors in 
the directions of increasing coordinate values. Thus for a particular Cartesian 
coordinate system (for example) the component vectors of a vector a will be a x i, 
a v j and a z k and the complete vector will be 

a = a x i + a y j + a z k. (8.1) 

Although we have so far considered only real three-dimensional space, we may 
extend our notion of a vector to more abstract spaces, which in general can 
have an arbitrary number of dimensions N. We may still think of such a vector 
as an ‘arrow’ in this abstract space, so that it is again independent of any (N- 
dimensional) coordinate system with which we choose to label the space. As an 
example of such a space, which, though abstract, has very practical applications, 
we may consider the description of a mechanical or electrical system. If the state 
of a system is uniquely specified by assigning values to a set of N variables, 
which may be angles or currents, for example, then that state can be represented 
by a vector in an iV-dimensional space, the vector having those values as its 
components. 

In this chapter we first discuss general vector spaces and their properties. We 
then go on to discuss the transformation of one vector into another by a linear 
operator. This leads naturally to the concept of a matrix, a two-dimensional array 
of numbers. The properties of matrices are then discussed and we conclude with 
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a discussion of how to use these properties to solve systems of linear equations. 
The application of matrices to the study of oscillations in physical systems is 
taken up in chapter 9. 


8.1 Vector spaces 

A set of objects (vectors) a, b, c, ... is said to form a linear vector space V if: 

(i) the set is closed under commutative and associative addition, so that 

a + b = b + a, (8.2) 

(a + b) +c = a + (b + c); (8.3) 

(ii) the set is closed under multiplication by a scalar (any complex number) to 
form a new vector Aa, the operation being both distributive and associative 
so that 


A(a + b) = Aa + Ab, 

(8.4) 

(A + p)a = Aa + /<a, 

(8.5) 

II 

(8.6) 


where A and /( are arbitrary scalars; 

(iii) there exists a null vector 0 such that a + 0 = a for all a; 

(iv) multiplication by unity leaves any vector unchanged, i.e. 1 x a = a; 

(v) all vectors have a corresponding negative vector —a such that a + (— a) = 0. 
It follows from (8.5) with A = 1 and p = —1 that —a is the same vector as 
(-1) x a. 

We note that if we restrict all scalars to be real then we obtain a real vector 
space (an example of which is our familiar three-dimensional space); otherwise, 
in general, we obtain a complex vector space. We note that it is common to use the 
terms ‘vector space’ and ‘space’, instead of the more formal ‘linear vector space’. 

The span of a set of vectors a, b, ...,s is defined as the set of all vectors that 
may be written as a linear sum of the original set, i.e. all vectors 

x = a a + /fb + ■ ■ ■ + <ts (8.7) 

that result from the infinite number of possible values of the (in general complex) 
scalars a, <7. If x in (8.7) is equal to 0 for some choice of a, <7 (not all 
zero), i.e. if 

aa + /ib + • ■ ■ + <7S = 0, (8.8) 

then the set of vectors a, b ,s, is said to be linearly dependent. In such a set 

at least one vector is redundant, since it can be expressed as a linear sum of 


247 



MATRICES AND VECTOR SPACES 


the others. If, however, (8.8) is not satisfied by any set of coefficients (other than 
the trivial case in which all the coefficients are zero) then the vectors are linearly 
independent, and no vector in the set can be expressed as a linear sum of the 
others. 

If, in a given vector space, there exist sets of N linearly independent vectors, 
but no set of N + 1 linearly independent vectors, then the vector space is said to 
be iV-dimensional. (In this chapter we will limit our discussion to vector spaces of 
finite dimensionality; spaces of infinite dimensionality are discussed in chapter 17.) 

8.1.1 Basis vectors 

If V is an iV-dimensional vector space then any set of N linearly independent 
vectors ei, e 2 ,...,e N forms a basis for V. If x is an arbitrary vector lying in V then 
the set of N + 1 vectors x,ei,e 2 , ...,ejv, must be linearly dependent and therefore 
such that 

aei + /?e 2 H b at N + pr = 0, (8.9) 

where the coefficients a, are not all equal to 0, and in particular % ^ 0. 
Rearranging (8.9) we may write x as a linear sum of the vectors e,- as follows: 

N 

x = xiei + x 2 e 2 H b x N e N = ^ x,e,-, (8.10) 

i = 1 

for some set of coefficients x, that are simply related to the original coefficients, 
e.g. xi = — a/%, x 2 = — p/x, etc. Since any x lying in the span of V can be 
expressed in terms of the basis or base vectors e,, the latter are said to form 
a complete set. The coefficients x ; are the components of x with respect to the 
e, -basis. These components are unique, since if both 

1 V N 

x = ^2 x,e ; and x = ^ t',e,. 

i= 1 i= 1 

then 

N 

^(x, - y,)e, = 0 , (8.11) 

i= 1 

which, since the e, are linearly independent, has only the solution x,- = y,- for all 
i= 1,2,..., N. 

From the above discussion we see that any set of N linearly independent 
vectors can form a basis for an iV-dimensional space. If we choose a different set 
e', i = 1 ,...,N then we can write x as 

i v 

X = x\ e'l + x' 2 e' 2 H b x' v e' v = ^ x'e'. (8.12) 

i= 1 
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We reiterate that the vector x (a geometrical entity) is independent of the basis 
- it is only the components of x that depend on the basis. We note, however, 
that given a set of vectors ui,U2,...,Um» where M f N, in an iV-dimensional 
vector space, then either there exists a vector that cannot be expressed as a 
linear combination of the u, or, for some vector that can be so expressed, the 
components are not unique. 


8.1.2 The inner product 

We may usefully add to the description of vectors in a vector space by defining 
the inner product of two vectors, denoted in general by (a|b), which is a scalar 
function of a and b. The scalar or dot product, a b = |a||b|cos0, of vectors 
in real three-dimensional space (where 9 is the angle between the vectors), was 
introduced in the last chapter and is an example of an inner product. In effect the 
notion of an inner product (a|b) is a generalisation of the dot product to more 
abstract vector spaces. Alternative notations for (a|b) are (a, b), or simply a • b. 

The inner product has the following properties: 


(i) (a|b) = (b|a)*, 

(ii) (a|Ab + pc) = A(a|b) + /<(a|c). 

We note that in general, for a complex vector space, (i) and (ii) imply that 

(Aa + /.rb|c) = A*(a|c) + /t*(b|c), (8.13) 

(Aal/ib) = A>(a|b). (8.14) 


Following the analogy with the dot product in three-dimensional real space, 
two vectors in a general vector space are defined to be orthogonal if (a|b) = 0. 
Similarly, the norm of a vector a is given by ||a|| = (a|a) 1//2 and is clearly a 
generalisation of the length or modulus |a| of a vector a in three-dimensional 
space. In a general vector space (a|a) can be positive or negative; however, we 
shall be primarily concerned with spaces in which (a|a) > 0 and which are thus 
said to have a positive semi-definite norm. In such a space (a|a) = 0 implies a = 0. 

Let us now introduce into our IV-dimensional vector space a basis ei,e 2 , ...,ejv 
that has the desirable property of being orthonormal (the basis vectors are mutually 
orthogonal and each has unit norm), i.e. a basis that has the property 


— <V 


(8.15) 


Here <5y is the Kronecker delta symbol (of which we say more in chapter 21) and 
has the properties 


d 


9 ~ 


for i = j, 
for i f j. 
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In the above basis we may express any two vectors a and b as 

1 V N 

a = ^ a,-e,- and b = ^ h,e/. 

i=l i=l 

Furthermore, m such an orthonormal basis we have, for any a, 

iv iv 

(e ; -|a) = y^(e ; -|a,ei} = ^ a,(e 7 |e,) = aj. (8.16) 

i=l i=l 

Thus the components of a are given by a,- = (e,|a). Note that this is not true 
unless the basis is orthonormal. We can write the inner product of a and b in 
terms of their components in an orthonormal basis as 

(a|b) = (uiei + U2^2 + ■ • • + ujvejv|hiei + 62^2 + ■ ■ ■ + hjvejv) 

N N N 

= 5Z«i^(e,|e,} +^2^2a, b j{e i \e j ) 
i= 1 i 1 l/< 

N 

= J 2 °*i bi ’ 

i= 1 

where the second equality follows from (8.14) and the third from (8.15). This is 
clearly a generalisation of the expression (7.21) for the dot product of vectors in 
three-dimensional space. 

We may generalise the above to the case where the base vectors ei,e 2 ,...,ejy 
are not orthonormal (or orthogonal). In general we can define the N 2 numbers 

Gij = <e,|e;}. (8.17) 

Then, if a = J2iLi a < e ; and b = Y^u=i h,e,-, the inner product of a and b is given by 

(a|b) = ( 

N N 

= i e j) 

i= 1 1=1 

N N 

(8-i8) 

i= 1 1=1 

We further note that from (8.17) and the properties of the inner product we 
require Gy = G* ; . This in turn ensures that |[a|| = (a|a) is real, since then 

N N N N 

(ala)’ = Y Y a i (i h ll ’i = Y Y a *jGji a i = (a|a). 

1=1 1=1 1=1 1=1 


N ' 

Y b J e J 

7=1 / 
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8.1.3 Some useful inequalities 

For a set of objects (vectors) forming a linear vector space in which (a|a) > 0 for 
all a, the following inequalities are often useful. 

(i) Schwarz’s inequality is the most basic result and states that 

I (a|b) | < || a || || b || , (8.19) 

where the equality holds when a is a scalar multiple of b, i.e. when a = 2b. 
It is important here to distinguish between the absolute value of a scalar, 
|2|, and the norm of a vector, ||a||. Schwarz’s inequality may be proved by 
considering 

|| a + 2b|| 2 = (a + 2b|a + 2b) 

= (a|a) + 2(a|b) + 2*(b|a) + 22* (b|b). 

If we write (a|b) as |(a|b)|e'“ then 

|| a + 2b|| 2 = ||a|| 2 + |2| 2 ||b|| 2 + 2|(a|b)|e ict + 2*|(a|b)| e - ict . 

However, ||a + 2b|| 2 > 0 for all 2, so we may choose 2 = re~ n and require 
that, for all r, 

0 < || a + 2b || 2 = || a || 2 + r 2 1| b|| 2 + 2r|(a|b) |. 

This means that the quadratic equation in r formed by setting the RHS 
equal to zero must have no real roots. This, in turn, implies that 

4|(a|b)| 2 < 4 1| a || 2 1| b|| 2 , 

which, on taking the square root (all factors are necessarily positive) of 
both sides, gives Schwarz’s inequality. 

(ii) The triangle inequality states that 

|| a + b || < ||a|| + ||b|| (8.20) 

and may be derived from the properties of the inner product and Schwarz’s 
inequality as follows. Let us first consider 

|| a + b || 2 = || a || 2 + ||b|| 2 + 2 Re (a|b) < || a || 2 + ||b|| 2 + 2|(a|b)|. 

Using Schwarz’s inequality we then have 

|| a + b || 2 < || a || 2 + |jb|j 2 + 2|ja|| ||b|| = (||a|| + ||bj|) 2 , 

which, on taking the square root, gives the triangle inequality (8.20). 

(iii) Bessel’s inequality requires the introduction of an orthonormal basis e ; , 
i = 1,2, ...,1V into the IV-dimensional vector space; it states that 

||a|| 2 >]T|(e ; |a)| 2 , (8.21) 
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where the equality holds if the sum includes all N basis vectors. If not 
all the basis vectors are included in the sum then the inequality results 
(though of course the equality remains if those basis vectors omitted all 
have cij = 0). Bessel’s inequality can also be written 

(a|a) >Y^\ a i\ 2 , 

i 

where the a,- are the components of a in the orthonormal basis. From (8.16) 
these are given by a t — (e,|a). The above may be proved by considering 

2 

a - ^(e f |a)ei = (a - ^(e,|a)e; a - ^(e,- la}^. 

i i j 

Expanding out the inner product and using (e,|a)* = (a|e,), we obtain 

2 

a-^(e,'|a)e ; = (a|a) -2^(a|e,)(e,|a) +^^(a|e,)(e J |a)(e i |e ; -}. 

i i i j 

Now (e,|e 7 ) = Sjj, since the basis is orthonormal, and so we find 

2 

0< a-^(e f |a)e ; = ||a|| 2 - ^ |(e,-|a)| 2 , 

i i 

which is Bessel’s inequality. 

We take this opportunity to mention also 

(iv) the parallelogram equality 

|| a + b|| 2 + || a — b || 2 = 2 (||a|| 2 + |[b|| 2 ) , (8.22) 

which may be proved straightforwardly from the properties of the inner 
product. 


8.2 Linear operators 

We now discuss the action of linear operators on vectors in a vector space. A 
linear operator A associates with every vector x another vector 

y = Ax, 

in such a way that, for two vectors a and b, 

A (2a + pb) = aA a + pA b, 

where 2, p are scalars. We say that A ‘operates’ on x to give the vector y. We 
note that the action of A is independent of any basis or coordinate system and 
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may be thought of as ‘transforming’ one geometrical entity (i.e. a vector) into 
another. 

If we now introduce a basis e ; , i = into our vector space then the 

action of A on each of the basis vectors is to produce a linear combination of 
the latter; this may be written as 


N 

Atj = ^2A ij e i , (8.23) 

i=i 

where Ay is the ith component of the vector A e,- in this basis; collectively the 
numbers Ay are called the components of the linear operator in the e, -basis. In 
this basis we can express the relation y = A x in component form as 

N / N \ N N 

y = X >' e ' = A X! x J e J = X x j X Ai &’ 

;=i \j= l J ]= l (=1 

and hence, in purely component form, in this basis we have 

N 

= X A 'j x j- ( 8 - 24 ) 

j = 1 

If we had chosen a different basis e-, in which the components of x, y and A 
are x-, y\ and Ay respectively then the geometrical relationship y = A x would be 
represented in this new basis by 


1=1 

We have so far assumed that the vector y is in the same vector space as 
x. If, however, y belongs to a different vector space, which may in general be 
M-dimensional ( M N) then the above analysis needs a slight modification. By 
introducing a basis set f ; , i = 1, 2, . . . , M, into the vector space to which y belongs 
we may generalise (8.23) as 


M 

a ej = Ayfj, 

(=1 

where the components Ay of the linear operator A relate to both of the bases ej 
and f,. 
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8.2.1 Properties of linear operators 

If x is a vector and A and B are two linear operators then it follows that 

(A + B )x = A x + B x, 

(/.A )x = A(A x), 

(AB)x = A(Bx), 

where in the last equality we see that the action of two linear operators in 
succession is associative. The product of two linear operators is not in general 
commutative, however, so that in general ABx f B Ax. In an obvious way we 
define the null (or zero) and identity operators by 

O x = 0 and X x = x, 

for any vector x in our vector space. Two operators A and B are equal if 
Ax = Bx for all vectors x. Finally, if there exists an operator A~ [ such that 

A A- 1 = A~ ] A = X 

then A~ [ is the inverse of A . Some linear operators do not possess an inverse 
and are called singular , whilst those operators that do have an inverse are termed 
non-singular. 


8.3 Matrices 


We have seen that in a particular basis e ; both vectors and linear operators 
can be described in terms of their components with respect to the basis. These 
components may be displayed as an array of numbers called a matrix. In general, 
if a linear operator A transforms vectors from an iV-dimensional vector space, 
for which we choose a basis e,-, j = 1, 2, . . . , N, into vectors belonging to an 
M-dimensional vector space, with basis f ; , i = 1,2, ...,M, then we may represent 
the operator A by the matrix 


An 

A 12 

-4uv 

A21 

A22 

A2N 

Ami 

Ami ■ ■ 

Amn 


(8.25) 


The matrix elements Ay are the components of the linear operator with respect 
to the bases e ; - and f,; the component Ay of the linear operator appears in the 
7th row and yth column of the matrix. The array has M rows and N columns 
and is thus called an M x N matrix. If the dimensions of the two vector spaces 
are the same, i.e. M = N (for example, if they are the same vector space) then we 
may represent A by an N x N or square matrix of order N. The component Ay, 
which in general may be complex, is also denoted by (A )y. 
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vector x in terms of its components x,- in a 

/ M \ 

*2 

. 9 

V X N ) 

which is a special case of (8.25) and is called a column matrix (or conventionally, 
and slightly confusingly, a column vector or even just a vector - strictly speaking 
the term ‘vector’ refers to the geometrical entity x). The column matrix x can also 
be written as 

x = (xi x 2 • • • X;v) T , 

which is the transpose of a row matrix (see section 8.6). 

We note that in a different basis e' the vector x would be represented by a 
different column matrix containing the components x' in the new basis, i.e. 

( \ 

X, 

x' = . 

\ x'v / 

Thus, we use x and x' to denote different column matrices which, in different bases 
e, and e', represent the same vector x. In many texts, however, this distinction is 
not made and x (rather than x) is equated to the corresponding column matrix; if 
we regard x as the geometrical entity, however, this can be misleading and so we 
explicitly make the distinction. A similar argument follows for linear operators; 
the same linear operator A is described in different bases by different matrices A 
and A', containing different matrix elements. 

8.4 Basic matrix algebra 

The basic algebra of matrices may be deduced from the properties of the linear 
operators that they represent. In a given basis the action of two linear operators 
A and B on an arbitrary vector x (see the beginning of subsection 8.2.1), when 
written in terms of components using (8.24), is given by 

5^(A + B),-/X/ = AjjXj + B ijXj, 

j i i 

Tmuxi = ^ r AyXj, 

j j 

£(AB)yx,- = ^AiABxjk = 'Y^2,A ik BkjXj. 

j k j k 


In a similar way we may denote a 
basis e,, i = 1,2,..., N, by the array 


x = 
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Now, since x is arbitrary, we can immediately deduce the way in which matrices 
are added or multiplied, i.e. 


(A + B)y — Ajj + Bjj, (8.26) 

(AA)y = kA u , (8.27) 

(AB ) ij = Y / AikB kj . (8.28) 

k 

We note that a matrix element may, in general, be complex. We now discuss 
matrix addition and multiplication in more detail. 


8.4.1 Matrix addition and multiplication by a scalar 

From (8.26) we see that the sum of two matrices, S = A + B, is the matrix whose 
elements are given by 

Sij = A ‘j + B U 

for every pair of subscripts i,j, with i = 1,2, ...,M and j = 1,2 ,...,N. For 
example, if A and B are 2x3 matrices then S = A + B is given by 

( S u Sn S13 \ / A\\ An An \ _|_ / Bn B 12 Bn \ 

V S21 S22 S23 J V A21 A22 A21 j \ B21 B22 B23 J 

( An + Bn An + Bn An + Bn \ (8 29) 

\ A21 + B21 A22 + B22 A21 + B21 J 

Clearly, for the sum of two matrices to have any meaning, the matrices must have 
the same dimensions, i.e. both be M x N matrices. 

From definition (8.29) it follows that A + B = B + A and that the sum of a 
number of matrices can be written unambiguously without bracketting, i.e. matrix 
addition is commutative and associative. 

The difference of two matrices is defined by direct analogy with addition. The 
matrix D = A — B has elements 


Dij = Atj - Bij , for i= 1,2,..., M, j = 1 , 2 ,..., N. 


(8.30) 


From (8.27) the product of a matrix A with a scalar A is the matrix with 
elements /Ajj, for example 

- ( An An An \ _ f ^12 AM13 \ 18 31 ) 

V ^21 ^22 ^23 J V ^^21 A^22 AA23 J 

Multiplication by a scalar is distributive and associative. 
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► 77;e matrices A, B and C are given by 



Find the matrix D = A + 2B — C. 



D = 


2 - 
3 1 

2 + 2x1- 

3 + 2x0 


+ 2 


- 2 ) 

- 1 ) 


1 0 
0 -2 

-1+2x0- 
1 + 2 x (—2) - 


-2 1 

-1 1 


6 -2 
4 -4 


From the above considerations we see that the set of all, in general complex, 
M x N matrices (with fixed M and N ) forms a linear vector space of dimension 
MN. One basis for the space is the set of M x N matrices E (M) with the property 
that = 1 if i = p and j — q whilst E^’ q) = 0 for all other values of i and 
j, i.e. each matrix has only one non-zero entry, which equals unity. Here the pair 
(p, q) is simply a label that picks out a particular one of the matrices the 

total number of which is MN. 


8.4.2 Multiplication of matrices 

Let us consider again the ‘transformation’ of one vector into another, y = 2lx, 
which, from (8.24), may be described in terms of components with respect to a 
particular basis as 

N 

y t = ^ Ajj Xj for i= 1,2 ,...,M. (8.32) 

jM 

Writing this in matrix form as y = Ax we have 


( 

y 1 


j 

( A n 

A 12 ■■ 

Ain \ 

1 

/ 

Xl 

X2 

\ 


,V2 



An 

^22 

A 2 n 




V 

VM ) 


V Am 1 

Ami ■ ■ 

Amn ) 


V 

X N 

J 


where we have highlighted with boxes the components used to calculate the 
element yF- using (8.32) for i = 2, 

L 2 = A21X1 + A22X2 + ■ • ' + A 2 N*N- 

All the other components y t are calculated similarly. 

If instead we operate with A on a basis vector e, having all components zero 
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except for the /th. which equals unity, then we find 



and so confirm our identification of the matrix element Ay as the ith component 
of A e,- in this basis. 

From (8.28) we can extend our discussion to the product of two matrices 
P = AB, where P is the matrix of the quantities formed by the operation of 
the rows of A on the columns of B, treating each column of B in turn as the 
vector x represented in component form in (8.32). It is clear that, for this to be 
a meaningful definition, the number of columns in A must equal the number of 
rows in B. Thus the product AB of an M x N matrix A with an N x R matrix B 
is itself an M x R matrix P, where 

N 

Pij = ^ 2 A ikB k j for i = 1,2, ...,M, j=l,2,...,R. 

k= 1 

For example, P = AB may be written in matrix form 



where 

Pn = AuBn +M12B21 +^ 13 ^ 31 , 

P21 = A21B11 +A22B21 +M23-B31, 

P 12 = A \[ B {2 +^12.622 +^13*832, 

P22 ~ A21B 12 + A22B22 + M23-B32. 

Multiplication of more than two matrices follows naturally and is associative. 
So, for example, 

A(BC) = (AB)C, (8.34) 

provided, of course, that all the products are defined. 

As mentioned above, if A is an M x N matrix and B is an IV x M matrix then 
two product matrices are possible, i.e. 

P = AB and Q = BA. 
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These are clearly not the same, since P is an M x M matrix whilst Q is an 
N x N matrix. Thus, particular care must be taken to write matrix products in 
the intended order; P = AB but Q = BA. We note in passing that A 2 means AA, 
A 3 meansA(AA) = (AA)A etc. Even if both A and B are square, in general 

AB BA, (8.35) 

i.e. the multiplication of matrices is not, in general, commutative. 


► Evaluate P = AB and Q = BA where 





( 3 2 -1 \ 


( 2 

-2 

3 \ 

A = 0 3 2 , 

B = | 

1 

1 

0 . 

V 1 - 3 4 J 


V 3 

2 

l ) 


As we saw for the 2x2 case above, the element P,y of the matrix P = AB is found by 
mentally taking the ‘scalar product’ of the z'th row of A with the j'th column of B. For 
example, P tl = 3 x 2 + 2 x 1 + (—1) x 3 = 5, P l2 = 3 x (—2) + 2x1 + (—1) x 2 = —6, etc. 
Thus 

/ 3 2 — 1 \ / 2 —2 3 \ / 5 —6 8 \ 

P = AB = [ 0 3 2 1 1 0 = 9 7 2 , 

\ 1 -3 4 ) y 3 2 1 ) yil 3 7 ) 

and, similarly, 

/ 2 -2 3 \ / 3 2 -1 \ (9 -11 6 \ 

Q = BA = 1 1 0 0 3 2 = 3 5 1. 

y3 2 1 J ^ 1 -3 4 J y 10 9 5 J 

These results illustrate that, in general, two matrices do not commute. ◄ 

The property that matrix multiplication is distributive over addition, i.e. that 

(A + B)C = AC + BC (8.36) 

and 

C(A + B) = CA + CB, (8.37) 

follows directly from its definition. 


8.4.3 The null and identity matrices 

Both the null matrix and the identity matrix are frequently encountered, and we 
take this opportunity to introduce them briefly, leaving their uses until later. The 
null or zero matrix 0 has all elements equal to zero, and so its properties are 


AO = 0 = 0A, 


A + 0 = 0 + A = A. 
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The identity matrix I has the property 


Al = IA = A. 

It is clear that, in order for the above products to be defined, the identity matrix 
must be square. The N x N identity matrix (often denoted by Ijv) has the form 

/I 0 ... 0 \ 

Ijv = : • 

: 0 

V o ... 0 l ) 


8.5 Functions of matrices 

If a matrix A is square then, as mentioned above, one can define powers of A is a 
straightforward way. For example A 2 = AA, A 3 = AAA, or in the general case 

A" = AA ■ ■ ■ A (n times), 

where n is a positive integer. Having defined powers of a square matrix A, we 
may construct functions of A of the form 

S = ^u„A", 

n 

where the are simple scalars and the number of terms in the summation may 
be finite or infinite. In the case where the sum has an infinite number of terms, 
the sum has meaning only if it converges. A common example of such a function 
is the exponential of a matrix, which is defined by 

” A" 

exp A = ^—. (8.38) 

n = 0 

This definition can, in turn, be used to define other functions such as sin A and 
cos A. 


8.6 The transpose of a matrix 

We have seen that the components of a linear operator in a given coordinate sys- 
tem can be written in the form of a matrix A. We will also find it useful, however, 
to consider the different (but clearly related) matrix formed by interchanging the 
rows and columns of A. The matrix is called the transpose of A and is denoted 
by A t . 
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► Find the transpose of the matrix 


A = 


3 1 
0 4 


2 

1 


By interchanging the rows and columns of A we immediately obtain 


A t 


3 0 

1 4 

2 1 


. ◄ 


It is obvious that if A is an M x N matrix then its transpose A T is a N x M 
matrix. As mentioned in section 8.3, the transpose of a column matrix is a 
row matrix and vice versa. An important use of column and row matrices is 
in the representation of the inner product of two real vectors in terms of their 
components in a given basis. This notion is discussed fully in the next section, 
where it is extended to complex vectors. 

The transpose of the product of two matrices, ( AB ) T , is given by the product 
of their transposes taken in the reverse order, i.e. 

(AB) t = B t A t . (8.39) 


This is proved as follows: 

(AB )Jj = (AB),-, = J2 A JkBki 

k 

= 5jA T MB T k = ]T(B T ) i ,(A T ) /y = (B t A t ) (/ , 

k k 

and the proof can be extended to the product of several matrices to give 


(ABC-G) t = G t --C t B t A t . 


8.7 The complex and Hermitian conjugates of a matrix 

Two further matrices that can be derived from a given general M x N matrix 
are the complex conjugate, denoted by A*, and the Hermitian conjugate, denoted 
by A 1 '. 

The complex conjugate of a matrix A is the matrix obtained by taking the 
complex conjugate of each of the elements of A, i.e. 

(A* )ij = 04,7 f. 

Obviously if a matrix is real (i.e. it contains only real elements) then A* = A. 


261 




MATRICES AND VECTOR SPACES 


►find the complex conjugate of the matrix 


A = 


1 2 3 i \ 

1 + i 1 0 ) ' 


By taking the complex conjugate of each element we obtain immediately 


A’ = 


1 2 — 3i 

1 - i 1 0 


. ◄ 


The Hermitian conjugate, or adjoint, of a matrix A is the transpose of its 
complex conjugate, or equivalently, the complex conjugate of its transpose, i.e. 


A 1 ' = (A*) t = (A t )*. 


We note that if A is real (and so A* = A) then A'* = A T , and taking the Hermitian 
conjugate is equivalent to taking the transpose. Following the previous line of 
argument for the transpose of the product of several matrices, the Hermitian 
conjugate of such a product can be shown to be given by 

(AB---G) t = G t ---B t A t . (8.40) 


►find the Hermitian conjugate of the matrix 


A = 


1 2 3i \ 

1 + i 1 0 J ' 


Taking the complex conjugate of A and then forming the transpose we find 



We obtain the same result, of course, if we first take the transpose of A and then take the 
complex conjugate. ◄ 


An important use of the Hermitian conjugate (or transpose in the real case) 
is in connection with the inner product of two vectors. Suppose that in a given 
orthonormal basis the vectors a and b may be represented by the column matrices 



f ai ) 


f bi \ 


a 2 


b 2 

a = 


and b = 



\ a N / 


\ b N J 


Taking the Hermitian conjugate of a, to give a row matrix, and multiplying (on 
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the right) by b we obtain 


a^b = (Aj a 2 ■ ■ ■ a N ) 


( b : \ 

b 2 


V b N ) 


= j2 a *i b i. 


i= 1 


(8.42) 


which is the expression for the inner product (a|b) in that basis. We note that for 
real vectors (8.42) reduces to a T b = a,b,. 

If the basis e, is not orthonormal, so that, in general. 


( e i\ e j) — Gij ^ Sij, 

then, from (8.18), the scalar product of a and b in terms of their components with 
respect to this basis is given by 

1 V N 

(a|b) G, i b i = at<3b ’ 

M i=i 

where G is the N x N matrix with elements Gy. 


8.8 The trace of a matrix 

For a given matrix A, in the previous two sections we have considered various 
other matrices that can be derived from it. Flowever, sometimes one wishes to 
derive a single number from a matrix. The simplest example is the trace (or spur ) 
of a square matrix, which is denoted by Tr A. This quantity is defined as the sum 
of the diagonal elements of the matrix, 

N 

Tr A = An + ^22 + • • ■ + Ann — ^ ' An. (8.43) 

/- i 

It clear that the trace is a linear operation so that, for example, 

Tr( A + B) = Tr A + Tr B. 

A very useful property of traces is that the trace of the product of two matrices 
is independent of the order of their multiplication; this results holds whether or 
not the matrices commute and is proved as follows: 

N N N N N N 

Tr AB = £(AB)„. = AijBji = Y.Y. = E (BA ^' = Tr BA - 

1=1 Ml 2=1 Ml j=l 2=1 (8.44) 

The result can be extended to the product of several matrices. For example, from 
(8.44), we immediately find 

Tr ABC = Tr BCA = Tr CAB, 
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which shows that the trace of a product is invariant under cyclic permutations of 
the matrices in the product. Other easily derived properties of the trace are, for 
example, Tr A T = Tr A and Tr A 1 ' = (Tr A)*. 


8.9 The determinant of a matrix 


For a given matrix A, the determinant det A (like the trace) is a single number (or 
algebraic expression) that depends upon the elements of A. Also like the trace, 
the determinant is defined only for square matrices. If, for example, A is a 3 x 3 


matrix then its determinant, of order : 

i, is denoted 

by 


A n 

^12 

^ 4 l 3 

det A = A | = 

A21 

A22 

^23 


^31 

A32 

An 


In order to calculate the value of a determinant, we first need to introduce 
the notions of the minor and the cofactor of an element of a matrix. (We 
shall see that we can use the cofactors to write an order -3 determinant as the 
weighted sum of three order -2 determinants, thereby simplifying its evaluation.) 
The minor My of the element Aq of an N x N matrix A is the determinant of 
the (N — 1 ) x (N — 1 ) matrix obtained by removing all the elements of the ith 
row and yth column of A; the associated cofactor, Cy, is found by multiplying 
the minor by (— \)' + L 


► Find the cofactor of the element T23 of the matrix 

( A n 

An An \ 

A = I A21 

A22 A 2 i 

\ An 

A 32 An ) 


Removing all the elements of the second row and third column of A and forming the 
determinant of the remaining terms gives the minor 


M 23 = 


An An 
An An 


Multiplying the minor by (— 1) 2+3 = (— l ) 5 = —1 gives 



An 

An 


. ◄ 


We now define a determinant as the sum of the products of the elements of any 
row or column and their corresponding cofactors , e.g. M21C21 +^22^22 + AjsCzs or 
d-i 3 Ci3 TM23C23 +M33C33. Such a sum is called a Laplace expansion. For example, 
in the first of these expansions, using the elements of the second row of the 
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determinant defined by (8.45) and their corresponding cofactors, we write |A| as 
the Laplace expansion 


I A| = + 2 i(-l) (2+1) M 21 + A 22 (-lf + 2 ) M 22 + + 23 (-l) ,2+3) M 23 


= — + 2 i 


+ 12 -4l3 

4-32 +33 


+ +22 


-4n 

+31 


+ 13 
A 33 


— +23 


An A n 
A 31 432 


We will see later that the value of the determinant is independent of the row 
or column chosen. Of course, we have not yet determined the value of |A| but, 
rather, written it as the weighted sum of three determinants of order 2. However, 
applying again the definition of a determinant, we can evaluate each of the 
order-2 determinants. 


► Evaluate the determinant 


An An 
A i2 An 


By considering the products of the elements of the first row in the determinant, and their 
corresponding cofactors, we find 


An An =^12(- 1 ) <1+1) I^3|+A 13 (-1) ,1+2) |A 32 | 

= AnAn — + 13+3 2 , 


where the values of the order-1 determinants |/1 33 | and |A 32 | are defined to be T 33 and An 
respectively. It must be remembered that the determinant is not the same as the modulus, 
e.g. det (—2) = | — 2| = —2, not 2. ◄ 

We can now combine all the above results to show that the value of the 
determinant (8.45) is given by 


I A| = — +2l(+12+33 — + 13+32) + +22(+ll+33 — + 13+3l) 

— +23(+n+32 — +12+31) (8.46) 

= +ll(+22+33 — +23+32) + +12(+23+31 - +21+33) 

+ +i3(+2i+32 — +22+31 X (8-47) 

where the final expression gives the form in which the determinant is usually 
remembered and is the form that is obtained immediately by considering the 
Laplace expansion using the first row of the determinant. The last equality, which 
essentially rearranges a Laplace expansion using the second row into one using 
the first row, supports our assertion that the value of the determinant is unaffected 
by which row or column is chosen for the expansion. 
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► Suppose the rows of a real 3x3 matrix A are interpreted as the components in a given 
basis of three (three-component ) vectors a, b and c. Show that one can write the determinant 
of A as 

|A| = a • (b x c). 


If one writes the rows of A as the components in a given basis of three vectors a, b and c, 
we have from (8.47) that 



a 2 

a 3 

bi 

b 2 

h 

Cl 

c 2 

C3 


= afb 2 c 3 - b 3 c 2 ) + a 2 (b 3 c l - b m 3 ) + a 3 (b x c 2 - b 2 c 3 ). 


From expression (7.34) for the scalar triple product given in subsection 7.6.3, it follows 
that we may write the determinant as 

|A| = a ■ (b x c). (8.48) 

In other words, |A| is the volume of the parallelepiped defined by the vectors a, b and 
c. (One could equally well interpret the columns of the matrix A as the components of 
three vectors, and result (8.48) would still hold.) This result provides a more memorable 
(and more meaningful) expression than (8.47) for the value of a 3 x 3 determinant. Indeed, 
using this geometrical interpretation, we see immediately that, if the vectors a!, a 2 , a 3 are 
not linearly independent then the value of the determinant vanishes: |A| = 0. ◄ 


The evaluation of determinants of order greater than 3 follows the same general 
method as that presented above, in that it relies on successively reducing the order 
of the determinant by writing it as a Laplace expansion. Thus, a determinant 
of order 4 is first written as a sum of four determinants of order 3, which 
are then evaluated using the above method. For higher-order determinants, one 
cannot write down directly a simple geometrical expression for |A| analogous to 
that given in (8.48). Nevertheless, it is still true that if the rows or columns of 
the N x N matrix A are interpreted as the components in a given basis of N 
((V-component) vectors ai,a 2 , ...,ajv, then the determinant |A| vanishes if these 
vectors are not all linearly independent. 


8.9.1 Properties of determinants 

A number of properties of determinants follow straightforwardly from the defini- 
tion of det A; their use will often reduce the labour of evaluating a determinant. 
We present them here without specific proofs, though they all follow readily from 
the alternative form for a determinant, given in equation (21.28) on page 791, 
and expressed in terms of the Levi-Civita symbol (see exercise 21.9). 

(i) Determinant of the transpose. The transpose matrix A T (which, we recall, 
is obtained by interchanging the rows and columns of A) has the same 
determinant as A itself, i.e. 

I A T | = |A|. (8.49) 
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It follows that any theorem established for the rows of A will apply to the 
columns as well, and vice versa. 

(ii) Determinant of the complex and Hermitian conjugate. It is clear that the 
matrix A* obtained by taking the complex conjugate of each element of A 
has the determinant |A*| = |A|*. Combining this result with (8.49), we find 
that 

|At| = | ( A* ) T | = | A* | = | A|*. (8.50) 

(iii) Interchanging two rows or two columns. If two rows (columns) of A are 
interchanged, its determinant changes sign but is unaltered in magnitude. 

(iv) Removing factors. If all the elements of a single row (column) of A have 
a common factor. X, then this factor may be removed; the value of the 
determinant is given by the product of the remaining determinant and X. 
Clearly this implies that if all the elements of any row (column) are zero 
then |A| =0. It also follows that if every element of the N x N matrix A 
is multiplied by a constant factor X then 

|AA| = A n |A|. (8.51) 

(v) Identical rows or columns. If any two rows (columns) of A are identical or 
are multiples of one another, then it can be shown that |A| =0. 

(vi) Adding a constant multiple of one row ( column ) to another. The determinant 
of a matrix is unchanged in value by adding to the elements of one row 
(column) any fixed multiple of the elements of another row (column). 

(vii) Determinant of a product. If A and B are square matrices of the same order 
then 


|AB| = |A||B| = |BA|. (8.52) 

A simple extension of this property gives, for example, 

|AB • • • G| = |A||B| • • • |G| = |A||G| • • • |B| = | A • • • GB|, 

which shows that the determinant is invariant to cyclic permutations of 
the matrices in the product. 

There is no explicit procedure for using the above results in the evaluation of 
any given determinant, and judging the quickest route to an answer is a matter 
of experience. A general guide is to try to reduce all terms but one in a row or 
column to zero and hence in effect to obtain a determinant of smaller size. The 
steps taken in evaluating the determinant in the example below are certainly not 
the fastest, but they have been chosen in order to illustrate the use of most of the 
properties listed above. 
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► Evaluate the determinant 

1 

0 

2 3 


|A| = 

0 

1 

-2 1 


3 

-3 

4 -2 



-2 

1 

-2 -1 



Taking a factor 2 out of the third column and then adding the second column to the third 
gives 


1 

0 

1 

3 


1 

0 

1 

3 

0 

1 

-1 

1 


0 

1 

0 

1 

3 

-3 

2 

-2 

= 2 

3 

-3 

-1 

-2 

-2 

1 

-1 

-1 


-2 

1 

0 

-1 


Subtracting the second column from the fourth gives 


|A| = 2 


10 13 

0 10 0 
3-3-1 1 

-21 0-2 


We now note that the second row has only one non-zero element and so the determinant 
may conveniently be written as a Laplace expansion, i.e. 


|A| 


1 

1 

3 


4 

0 

4 

3 

-1 

1 

= 2 

3 

-1 

1 

-2 

0 

-2 


-2 

0 

-2 


where the last equality follows by adding the second row to the first. It can now be seen 
that the first row is minus twice the third, and so the value of the determinant is zero, by 
property (v) above. ◄ 


8.10 The inverse of a matrix 

Our first use of determinants will be in defining the inverse of a matrix. If we 
were dealing with ordinary numbers we would consider the relation P = AB as 
equivalent to B = P/A, provided that A 0. However, if A, B and P are matrices 
then this notation does not have an obvious meaning. What we really want to 
know is whether an explicit formula for B can be obtained in terms of A and 
P. It will be shown that this is possible for those cases in which |A| ^ 0. A 
square matrix whose determinant is zero is called a singular matrix; otherwise it 
is non-singular. We will show that if A is non-singular we can define a matrix, 
denoted by A^ 1 and called the inverse of A, which has the property that if AB = P 
then B = A _1 P. In words, B can be obtained by multiplying P from the left by 
A -1 . Analogously, if B is non-singular then, by multiplication from the right, 
A = PB -1 . 

It is clear that 


Al = A => I = A _1 A, 


(8.53) 
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where I is the unit matrix, and so A _1 A = I = AA _1 . These statements are 
equivalent to saying that if we first multiply a matrix, B say, by A and then 
multiply by the inverse A -1 , we end up with the matrix we started with, i.e. 

A _1 AB = B. (8.54) 

This justifies our use of the term inverse. It is also clear that the inverse is only 
defined for square matrices. 

So far we have only defined what we mean by the inverse of a matrix. Actually 
finding the inverse of a matrix A may be carried out in a number of ways. We will 
show that one method is to construct first the matrix C containing the cofactors 
of the elements of A, as discussed in the last subsection. Then the required inverse 
A -1 can be found by forming the transpose of C and dividing by the determinant 
of A. Thus the elements of the inverse A^ 1 are given by 

(A ")‘ = if = W ,8 - 55) 

That this procedure does indeed result in the inverse may be seen by considering 
the components of A -1 A, i.e. 

(A-'Ajy = = E §f4y = j^y- ( 8 - 56 ) 

k k 

The last equality in (8.56) relies on the property 

E C k iA k j = |A|<5y-; (8.57) 

k 

this can be proved by considering the matrix A' obtained from the original matrix 
A when the 7th column of A is replaced by one of the other columns, say the jth. 
Thus A' is a matrix with two identical columns and so has zero determinant. 
However, replacing the ith column by another does not change the cofactors Q ; 
of the elements in the ith column, which are therefore the same in A and A'. 
Recalling the Laplace expansion of a determinant, i.e. 

|A| = E AkiCkt, 

k 

we obtain 

0 = I A j = 'y ' AfcCfc = y ' A kj C ki , i =^= j, 

k k 

which together with the Laplace expansion itself may be summarised by (8.57). 

It is immediately obvious from (8.55) that the inverse of a matrix is not defined 
if the matrix is singular (i.e. if |A| = 0). 
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►Find the inverse of the matrix 



We first determine |A|: 

|A| = 2 [—2(2) - (-2)3] + 4[( — 2)( — 3) - (1)(2)] + 3[(1)(3) - (— 2)(-3)] 

= 11. (8.58) 

This is non-zero and so an inverse matrix can be constructed. To do this we need the 
matrix of the cofactors, C, and hence C T . We find 


C = 


2 

1 

-2 



and 


C T = 


2 

4 

-3 


1 

13 

-18 



and hence 


A -1 


C T 

lAf 


1 

IT 


2 

4 

-3 


1 

13 

-18 


-2 

7 

-8 


. ◄ 


(8.59) 


For a 2 x 2 matrix, the inverse has a particularly simple form. If the matrix is 


A _ f it A l2 

V A-21 A 2 2 

then its determinant |A| is given by |A| = Ai\A 2 2 — ^12^21, and the matrix of 
cofactors is 


/ ^22 —^21 \ 

1 -A 12 A n J 


Thus the inverse of A is given by 

^22 —A 12 
—^21 An 

It can be seen that the transposed matrix of cofactors for a 2 x 2 matrix is the 
same as the matrix formed by swapping the elements on the leading diagonal 
(An and A22) and changing the signs of the other two elements (A12 and A21). 
This is completely general for a 2 x 2 matrix and is easy to remember. 

The following are some further useful properties related to the inverse matrix 


( 8 . 60 ) 


^ J V-Z 


1 


|A| A11A22 — A12A21 
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and may be straightforwardly derived. 


(i) 

(A- 1 )- 

1 = A. 

(ii) 

(AV 

= (A- 1 ) 7 

(iii) 

(At)- 1 

= (A- 1 ^. 

(iv) 

(AB)- 1 

= B -1 A -1 . 

(v) 

(AB ■ ■ ■ 

Gp 1 = G- 


>-Prove the properties (i) — (v) stated above. 


We begin by writing down the fundamental expression defining the inverse of a non- 
singular square matrix A : 

AA -1 = I = A -1 A. (8.61) 

Property (i): This follows immediately from the expression (8.61). 

Property (ii): Taking the transpose of each expression in (8.61) gives 

(AA- 1 ) 1 = l T = (A-‘A) t . 

Using the result (8.39) for the transpose of a product of matrices and noting that l T = I, 
we find 

(A-‘) t A t = | = A t (A- 1 ) T - 

However, from (8.61), this implies (A -1 ) 7 = (A 7 ) -1 and hence proves result (ii) above. 

Property (iii): This may be proved in an analogous way to property (ii), by replacing 
the transposes in (ii) by Hermitian conjugates and using the result (8.40) for the Hermitian 
conjugate of a product of matrices. 

Property (iv): Using (8.61), we may write 

(AB)(AB) 1 = I = (AB)^(AB), 

From the left-hand equality it follows, by multiplying on the left by A -1 , that 
A^ABjABp 1 = A- 1 ! and hence B(AB)” 1 = A- 1 . 

Now multiplying on the left by B” 1 gives 

B _1 B(AB) _1 = B^A- 1 , 

and hence the stated result. 

Property (v): Finally, result (iv) may extended to case (iv) in a straightforward manner. 
For example, using result (iv) twice we find 

(ABC)- 1 = (BC)-'A- 1 = C-'B-'A- 1 . ◄ 

We conclude this section by noting that the determinant | A -1 1 of the inverse 
matrix can be expressed very simply in terms of the determinant |A| of the matrix 
itself. Again we start with the fundamental expression (8.61). Then, using the 
property (8.52) for the determinant of a product, we find 

|AA-i| = IAIIA- 1 ! = |l|. 

It is straightforward to show by Laplace expansion that |l| = 1, and so we arrive 
at the useful result 

' A ~‘| - W (8 - 62) 
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8.11 The rank of a matrix 

The rank of a general M x N matrix is an important concept, particularly in 
the solution of sets of simultaneous linear equations, to be discussed in the next 
section, and we now discuss it in some detail. Like the trace and determinant, 
the rank of matrix A is a single number (or algebraic expression) that depends 
on the elements of A. Unlike the trace and determinant, however, the rank of a 
matrix can be defined even when A is not square. As we shall see, there are two 
equivalent definitions of the rank of a general matrix. 

Firstly, the rank of a matrix may be defined in terms of the linear independence 
of vectors. Suppose that the columns of an M x N matrix are interpreted as 
the components in a given basis of N (M-component) vectors vi,v 2 , ...,Vjv, as 
follows : 

/ T T T \ 

A = vi v 2 ... v N . 

VII I / 

Then the rank of A, denoted by rank A or by R( A), is defined as the number 
of linearly independent vectors in the set vi,v 2 ,...,vjy, and equals the dimension 
of the vector space spanned by those vectors. Alternatively, we may consider the 
rows of A to contain the components in a given basis of the M (A-component) 
vectors wi,w 2 , ...,Wm as follows: 

Wi 

W 2 

w M 

It may then be shownf that the rank of A is also equal to the number of 
linearly independent vectors in the set wi,w 2 , ...,Wm- From this definition it is 
should be clear that the rank of A is unaffected by the exchange of two rows 
(or two columns) or by the multiplication of a row (or column) by a constant. 
Furthermore, suppose that a constant multiple of one row (column) is added to 
another row (column): for example, we might replace the row w, by w, + cw ; . 
This also has no effect on the number of linearly independent rows and so leaves 
the rank of A unchanged. We may use these properties to evaluate the rank of a 
given matrix. 

A second (equivalent) definition may be given of the rank of a matrix and uses 
the concept of submatrices. A submatrix of A is any matrix that can be formed 
from the elements of A by ignoring one. or more than one, row or column. It 

) For a fuller discussion, see, for example. Modern Mathematical Methods for Physicists and Engineers, 
chapter 6, C. D. Cantrell (Cambridge University Press). 
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may be shown that the rank of a general M x N matrix is equal to the size of 
the largest square submatrix of A whose determinant is non-zero. Therefore, if a 
matrix A has anrxr submatrix S with |S| f 0, but no (r + 1) x (r + 1) submatrix 
with non-zero determinant then the rank of the matrix is r. From either definition 
it is clear that the rank of A is less than or equal to the smaller of M and N. 


► Determine the rank of the matrix 

( 1 1 

0 -2 \ 

A = | 

2 0 

2 

2 ' 


V 4 1 

3 

1 ) 


The largest possible square submatrices of A must be of dimension 3x3. Clearly, A 
possesses four such submatrices, the determinants of which are given by 


1 

1 

0 


1 

1 

-2 

2 

0 

2 

= 0, 

2 

0 

2 

4 

1 

3 


4 

1 

1 

1 

0 

-2 


1 

0 

-2 

2 

2 

2 

= 0, 

0 

2 

2 

4 

3 

1 


1 

3 

1 


(In each case the determinant may be evaluated as described in subsection 8.9.1.) 

The next largest square submatrices of A are of dimension 2x2. Consider, for example, 
the 2x2 submatrix formed by ignoring the third row and the third and fourth columns 
of A; this has determinant 

2 q =1x0 — 2x1 = —2. 

Since its determinant is non-zero, A is of rank 2 and we need not consider any other 2x2 
submatrices. ◄ 

In the special case in which the matrix A is a square NxN matrix, by comparing 
either of the above definitions of rank with our discussion of determinants in 
section 8.9, we see that |A| = 0 unless the rank of A is N. In other words, A is 
singular unless R{ A) = N. 


8.12 Special types of square matrix 

Matrices that are square, i.e. NxN, are very common in physical applications. 
We now consider some special forms of square matrix that are of particular 
importance. 


8.12.1 Diagonal matrices 

The unit matrix, which we have already encountered, is an example of a diagonal 
matrix. Such matrices are characterised by having non-zero elements only on the 
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leading diagonal , i.e. only elements My with i = j may be non-zero. For example, 


A = 


1 0 0 
0 2 0 
0 0-3 


is a 3 x 3 diagonal matrix. Such a matrix is often denoted by A = diag (1,2, —3). 
By performing a Laplace expansion, it is easily shown that the determinant of an 
N x N diagonal matrix is equal to the product of the diagonal elements. Thus, if 
the matrix has the form A = diag(Mn,M 2 2 , • ■ ■ ,A N n) then 


|A| — M11M22 • • • Ann- 


(8.63) 


Moreover, it is also straightforward to show that the inverse of A is also a 
diagonal matrix given by 

A - 1 = diag ( - 1 -, - 1 -, . . . , — ^ . 

\Mn M22 Ann) 

Finally, we note that, if two matrices A and B are both diagonal then they have 
the useful property that their product is commutative: 


AB = BA. 


This is not true for matrices in general. 


8.12.2 Lower and upper triangular matrices 

A square matrix A is called lower triangular if all the elements above the principal 
diagonal are zero. For example, the general form for a 3 x 3 lower triangular 
matrix is 

(An 0 0 \ 

A = I A 2 ] A22 0 I , 

\ A 31 M 32 M 33 J 

where the elements My may be zero or non-zero. Similarly an upper triangular 
square matrix is one for which all the elements below the principal diagonal are 
zero. The general 3x3 form is thus 

/ An A 12 M 13 \ 

A = I 0 M 22 M 23 J . 

V 0 0 M 33 / 

By performing a Laplace expansion, it is straightforward to show that, in the 
general N x N case, the determinant of an upper or lower triangular matrix is 
equal to the product of its diagonal elements, 

|A| = M 11 M 22 • • • Mjvjv- (8.64) 
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Clearly result (8.63) for diagonal matrices is a special case of this result. Moreover, 
it may be shown that the inverse of a non-singular lower (upper) triangular matrix 
is also lower (upper) triangular. 


8.12.3 Symmetric and antisymmetric matrices 

A square matrix A of order N with the property A = A T is said to be symmetric. 
Similarly a matrix for which A = — A T is said to be anti- or s/cew-symmetric 
and its diagonal elements an,a 2 i,...,aNN are necessarily zero. Moreover, if A is 
(anti-)symmetric then so too is its inverse A This is easily proved by noting 
that if A = +A t then 

(A- 1 ) 1 = (A 1 )- 1 = +A- 1 . 

Any N x N matrix A can be written as the sum of a symmetric and an 
antisymmetric matrix, since we may write 

A = i(A + A t ) + i(A - A T ) = B + C, 

where clearly B = B T and C = — C T . The matrix B is therefore called the 
symmetric part of A, and C is the antisymmetric part. 

► //" A is an N x N antisymmetric matrix, show that |A| = 0 if N is odd. 

If A is antisymmetric then A T = —A. Using the properties of determinants (8.49) and 
(8.51), we have 

|A| = |A T | = | — A| = (— 1) N |A|. 

Thus, if N is odd then |A| = — |A|, which implies that |A| =0. ◄ 


8.12.4 Orthogonal matrices 

A non-singular matrix with the property that its transpose is also its inverse, 

A t = A" 1 , (8.65) 

is called an orthogonal matrix. It follows immediately that the inverse of an 
orthogonal matrix is also orthogonal, since 

(A- 1 ) 1 = (A 1 )" 1 = (A- 1 ) -1 . 

Moreover, since for an orthogonal matrix A T A = I, we have 
|A t A| = |A t ||A| = |A| 2 = |l| = 1. 

Thus the determinant of an orthogonal matrix must be |A| = +1. 

An orthogonal matrix represents, in a particular basis, a linear operator that 
leaves the norms (lengths) of real vectors unchanged, as we will now show. 
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Suppose that y = Ax is represented in some coordinate system by the matrix 
equation y = Ax; then (y|y) is given in this coordinate system by 

y T y = x t A t Ax = x T x. 

Hence (y|y) = (x|x), showing that the action of a linear operator represented by 
an orthogonal matrix does not change the norm of a real vector. 


8.12.5 Hermitian and anti-Hermitian matrices 

An Hermitian matrix is one that satisfies A = A 1 ', where A 1 ' is the Hermitian 
conjugate discussed in section 8.7. Similarly if A^ = —A, then A is called anti- 
Hermitian. A real (anti-)symmetric matrix is a special case of an (anti-)Hermitian 
matrix, in which all the elements of the matrix are real. Also, if A is an (anti- 
)Hermitian matrix then so too is its inverse A^ 1 , since 

( A 1 = (A t r 1 = +A- 1 . 

Any N x N matrix A can be written as the sum of an Hermitian matrix and 
an anti-Hermitian matrix, since 

A = i(A + A t ) + i(A - A 1 ') = B + C, 

where clearly B = B' f and C = — C^. The matrix B is called the Hermitian part of 
A, and C is called the anti-Hermitian part. 


8.12.6 Unitary matrices 

A unitary matrix A is defined as one for which 

A r = A" 1 . (8.66) 

Clearly, if A is real then A 1 ' = A T , showing that a real orthogonal matrix is a 
special case of a unitary matrix, one in which all the elements are real. We note 
that the inverse A -1 of a unitary is also unitary, since 

(A~ 1 ) t = (A 1 T 1 = (A- 1 )- 1 . 

Moreover, since for a unitary matrix A’ A = I, we have 

|A^A| = | A f 1 1 A | = | Af | A| = I = 1. 

Thus the determinant of a unitary matrix has unit modulus. 

A unitary matrix represents, in a particular basis, a linear operator that leaves 
the norms (lengths) of complex vectors unchanged. If y = _4x is represented in 
some coordinate system by the matrix equation y = Ax then (y|y) is given in this 
coordinate system by 

yty = X^A^AX = X^X. 
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Hence (y|y) = (x|x), showing that the action of the linear operator represented by 
a unitary matrix does not change the norm of a complex vector. The action of a 
unitary matrix on a complex column matrix thus parallels that of an orthogonal 
matrix acting on a real column matrix. 


8.12.7 Normal matrices 

A final important set of special matrices consists of the normal matrices, for which 

AA f = A f A, 

i.e. a normal matrix is one that commutes with its Hermitian conjugate. 

We can easily show that Hermitian matrices and unitary matrices (or symmetric 
matrices and orthogonal matrices in the real case) are examples of normal 
matrices. For an Hermitian matrix, A = and so 

AA 1 = AA = A f A. 

Similarly, for a unitary matrix, A = A' and so 

AA f = AA^ 1 = A" 1 A = A f A. 

Finally, we note that, if A is normal then so too is its inverse A^ 1 , since 

A^ 1 (A- 1 ) t = A-Wr 1 = (AU)- 1 = (AA 1 T 1 = (A 1 ')- 1 A” 1 = (A -1 ) 1 ' A -1 . 

This broad class of matrices is important in the discussion of eigenvectors and 
eigenvalues in the next section. 


8.13 Eigenvectors and eigenvalues 

Suppose that a linear operator A transforms vectors x in an iV-dimensional 
vector space into other vectors A x in the same space. The possibility then arises 
that there exist vectors x each of which is transformed by A into a multiple of 
itself. Such vectors would have to satisfy 

Ax = Ax. (8.67) 

Any non-zero vector x that satisfies (8.67) for some value of A is called an 
eigenvector of the linear operator A , and A is called the corresponding eigenvalue. 
As will be discussed below, in general the operator A has N independent 
eigenvectors x ! , with eigenvalues The 7, are not necessarily all distinct. 

If we choose a particular basis in the vector space, we can write (8.67) in terms 
of the components of A and x with respect to this basis as the matrix equation 

Ax = 7.x, (8.68) 

where A is an N x N matrix. The column matrices x that satisfy (8.68) obviously 
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represent the eigenvectors x of A in our chosen coordinate system. Convention- 
ally, these column matrices are also referred to as the eigenvectors of the matrix 
A.f Clearly, if x is an eigenvector of A (with some eigenvalue A) then any scalar 
multiple /(X is also an eigenvector with the same eigenvalue. We therefore often 
use normalised eigenvectors, for which 

x T x = 1 

(note that x^x corresponds to the inner product (x|x) in our basis). Any eigen- 
vector x can be normalised by dividing all its components by the scalar (x^x) 1 / 2 . 

As will be seen, the problem of finding the eigenvalues and corresponding 
eigenvectors of a square matrix A plays an important role in many physical 
investigations. Throughout this chapter we denote the ith eigenvector of a square 
matrix A by x 1 and the corresponding eigenvalue by A,-. This superscript notation 
for eigenvectors is used to avoid any confusion with components. 

►T non-singular matrix A has eigenvalues A,- and eigenvectors x'. Find the eigenvalues and 
eigenvectors of the inverse matrix A -1 . 

The eigenvalues and eigenvectors of A satisfy 

Ax' = A,x'. 

Left-multiplying both sides of this equation by A -1 , we find 

A- 1 Ax' = A,-A _1 x'. 

Since A _1 A = I, on rearranging we obtain 

A-'x' = -x 1 . 

h 

Thus, we see that A -1 has the same eigenvectors x' as does A, but the corresponding 
eigenvalues are 1 / A, . ◄ 

In the remainder of this section we will discuss some useful results concerning 
the eigenvectors and eigenvalues of certain special (though commonly occurring) 
square matrices. The results will be established for matrices whose elements may 
be complex; the corresponding properties for real matrices may be obtained as 
special cases. 


8.13.1 Eigenvectors and eigenvalues of a normal matrix 

In subsection 8.12.7 we defined a normal matrix A as one that commutes with its 
Hermitian conjugate, so that 

A f A = AA f . 


f In this context, when referring to linear combinations of eigenvectors x we will normally use the 
term ‘vector'. 
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We also showed that both Hermitian and unitary matrices (or symmetric and 
orthogonal matrices in the real case) are examples of normal matrices. We now 
discuss the properties of the eigenvectors and eigenvalues of a normal matrix. 

If x is an eigenvector of a normal matrix A with corresponding eigenvalue 1 
then Ax = lx, or equivalently, 


(A — ll)x = 0. (8.69) 

Denoting B = A — 11, (8.69) becomes Bx = 0 and, taking the Hermitian conjugate, 
we also have 

(Bx) t = x t B t = 0. (8.70) 

From (8.69) and (8.70) we then have 

x t B t Bx = 0. (8.71) 

However, the product B^B is given by 

B t B = (A — ll) f (A — II) = (A f — 1* l)( A — II) = A'i'A-l'A-lA 1 ' +11’. 

Now since A is normal, AA' = A 1 ' A (see subsection 8.12.7) and so 

B^B = AA f - 1* A - lA t + 11* = (A - ll)(A - ll) t = BB t , 
and hence B is also normal. From (8.71) we then find 

x^Bx = x 1 ^BB t x = (B t x) t B t x = 0, 

from which we obtain 

B^x = (A 1 — 1* I )x = 0. 

Therefore, for a normal matrix A, the eigenvalues of are the complex conjugates 
of the eigenvalues of A. 

Let us now consider two eigenvectors x' and x ; of a normal matrix A corre- 
sponding to two different eigenvalues 1,- and Xj. We then have 

Ax ! = !,x', (8.72) 

A x j = XjX j . (8.73) 

Multiplying (8.73) on the left by (x ! )f we obtain 

(x‘f Ax' = Xj(x'fxf (8.74) 

However, on the LHS of (8.74) we have 

(x')^A = (A t x i ) t = (l’x') t = l i (x i ) t , (8.75) 

where we have used (8.40) and the property just proved for a normal matrix to 
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write A^x' = A* x'. From (8.74) and (8.75) we have 

(A, - = 0. (8.76) 

Thus, if A; f kj the eigenvectors x' and x ; must be orthogonal , i.e. (x'jV = 0. 

It follows immediately from (8.76) that if all N eigenvalues of a normal matrix 
A are distinct then all N eigenvectors of A are mutually orthogonal. If, however, 
two or more eigenvalues are the same then further consideration is required. An 
eigenvalue corresponding to two or more different eigenvectors (i.e. they are not 
simply multiples of one another) is said to be degenerate. Suppose that ).\ is fc-fold 
degenerate, i.e. 

Ax' = Ajx' for i = 1,2, (8.77) 

but that it is different from any of A*+ 1 , A&+ 2 , etc. Then any linear combination 
of these x' is also an eigenvector with eigenvalue Ai, since, for z = Ym=i c ; x ’> 

k k k 

Az = A 22 CjX 1 = '22 c,Ax' = 22 CjAix' = Aiz. (8.78) 

j=l (=1 1=1 

If the x' dehned in (8.77) are not already mutually orthogonal then we can 
construct new eigenvectors z' that are orthogonal by the following procedure : 

z 1 =x\ 

z 2 =x 2 -[(z 1 )tx 2 ] z 1 , 

Z 3 = X 3 - j(z 2 )tx 3 ] ^-[(z^x 3 ] z 1 , 

z k = x k - [(z k -y* k } z k ~ l [(Z 1 )^^] z 1 . 

In this procedure, known as Gram Schmidt orthogonalisation, each new eigen- 
vector z' is normalised to give the unit vector z' before proceeding to the construc- 
tion of the next one (the normalisation is carried out by dividing each element of 
the vector z' by [(z')f'z'] 1 / 2 ). Note that each factor in brackets (z m )lx" is a scalar 
product and thus only a number. It follows that, as shown in (8.78), each vector 
z' so constructed is an eigenvector of A with eigenvalue Ai and will remain so 
on normalisation. It is straightforward to check that, provided the previous new 
eigenvectors have been normalised as prescribed, each z' is orthogonal to all its 
predecessors. (In practice, however, the method is laborious and the example in 
subsection 8.14.1 gives a less rigorous but considerably quicker way.) 

Therefore, even if A has some degenerate eigenvalues we can by construction 
obtain a set of N mutually orthogonal eigenvectors. Moreover, it may be shown 
(although the proof is beyond the scope of this book) that these eigenvectors 
are complete in that they form a basis for the iV-dimensional vector space. As 
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a result any arbitrary vector y can be expressed as a linear combination of the 
eigenvectors x' : 

N 

y = 5>,x', (8-79) 

(=1 

where a,- = (x'^y. Thus, the eigenvectors form an orthogonal basis for the vector 
space. By normalising the eigenvectors so that (x')f x ! = 1 this basis is made 
orthonormal. 


► S/jow that a normal matrix A can be written in terms of its eigenvalues 1; and orthogonal 
eigenvectors x' as 

N 

A = ^;,x i (x') t . (8.80) 

(=i 


The key to proving the validity of (8.80) is to show that both sides of the expression give 
the same result when acting on an arbitary vector y. Since A is normal, we may expand y 
in terms of the eigenvectors x', as shown in (8.79). Thus, we have 

N N 

Ay = A a,x' = ^ a l X i x 1 . 

i=l >=1 

Alternatively, the action of the RHS of (8.80) on y is given by 

JV N 

Y A,;x'(x')^y = a,A,x', 

i=l i=l 

since a,- = (x')ty. We see that the two expressions for the action of each side of (8.80) on y 
are identical, which implies that this relationship is indeed correct. ◄ 


8.13.2 Eigenvectors and eigenvalues of Hermitian and anti-Hermitian matrices 

For a normal matrix we showed that if Ax = lx then A^x = l*x. However, if A is 
also Hermitian, A = A^, it follows necessarily that 1 = 1*. Thus, the eigenvalues 
of an Hermitian matrix are real, a result which may be proved directly. 

► Prone that the eigenvalues of an Hermitian matrix are real. 

For any particular eigenvector x', we take the Hermitian conjugate of Ax' = l,x' to give 

(x'Va 1, = A*(x') t . (8.81) 

Using A^ = A, since A is Hermitian, and multiplying on the right by x‘, we obtain 

(x i ) t Ax i = AJixbV. (8.82) 

But multiplying Ax' = 2,-x' through on the left by (x’) t gives 

(x') t Ax i = l,(x') t x i . 
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Subtracting this from (8.82) yields 

0 = (A* — AOMV. 

But (x'^x' is the modulus squared of the non-zero vector x' and is thus non-zero. Hence 
A* must equal A,- and thus be real. The same argument can be used to show that the 
eigenvalues of a real symmetric matrix are themselves real. ◄ 

The importance of the above result will be apparent to any student of quantum 
mechanics. In quantum mechanics the eigenvalues of operators correspond to 
measured values of observable quantities, e.g. energy, angular momentum, parity 
and so on, and these clearly must be real. If we use Hermitian operators to 
formulate the theories of quantum mechanics, the above property guarantees 
physically meaningful results. 

Since an Hermitian matrix is also a normal matrix, its eigenvectors are orthog- 
onal (or can be made so using the Gram-Schmidt orthogonalisation procedure). 
Alternatively we can prove the orthogonality of the eigenvectors directly. 

►Prore that the eigenvectors corresponding to different eigenvalues of an Hermitian matrix 
are orthogonal. 

Consider two unequal eigenvalues A,- and Ay and their corresponding eigenvectors satisfying 

Ax' = A,x\ (8.83) 

Ax y = AyxC (8.84) 

Taking the Hermitian conjugate of (8.83) we find (x') 1 A t = A*(x‘)f Multiplying this on the 
right by x j we obtain 

(x')tAV = A’fx'M 

and similarly multiplying (8.84) through on the left by (x') 1 ' we find 

(x ! ')W = Ay(x')V. 

Then, since A t = A, the two left-hand sides are equal and, because the A,- are real, on 
subtraction we obtain 

0 = (A,- — A ; -)(x') 1 V. 

Finally we note that A,- f A ; - and so (x'^'x' = 0, i.e. the eigenvectors x' and x ; are 
orthogonal. ◄ 

In the case where some of the eigenvalues are equal, further justification of the 
orthogonality of the eigenvectors is needed. The Gram-Schmidt orthogonalisa- 
tion procedure discussed above provides a proof of, and a means of achieving, 
orthogonality. The general method has already been described and we will not 
repeat it here. 

We may also consider the properties of the eigenvalues and eigenvectors of an 
anti-Hermitian matrix, for which A 1 ' = —A and thus 

AA r = A(— A) = (-A)A = A f A. 

Therefore matrices that are anti-Hermitian are also normal and so have mutu- 
ally orthogonal eigenvectors. The properties of the eigenvalues are also simply 
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deduced, since if Ax = lx then 

l*x = A^x = —Ax = —lx. 

Hence 1* = —1 and so 1 must be pure imaginary (or zero). In a similar manner 
to that used for Hermitian matrices, these properties may be proved directly. 


8.13.3 Eigenvectors and eigenvalues of a unitary matrix 

A unitary matrix satisfies A 1 ' = A -1 and is also a normal matrix, with mutually 
orthogonal eigenvectors. To investigate the eigenvalues of a unitary matrix, we 
note that if Ax = lx then 

x^x = x^A^Ax = I'lx^x, 

and we deduce that 11’ = |1| 2 = 1. Thus, the eigenvalues of a unitary matrix 
have unit modulus. 


8.13.4 Eigenvectors and eigenvalues of a general square matrix 

When an N x N matrix is not normal there are no general properties of its 
eigenvalues and eigenvectors; in general it is not possible to find any orthogonal 
set of N eigenvectors or even to find pairs of orthogonal eigenvectors (except 
by chance in some cases). While the N non-orthogonal eigenvectors are usually 
linearly independent and hence form a basis for the iV-dimensional vector space, 
this is not necessarily so. It may be shown (although we will not prove it) that any 
N x N matrix with distinct eigenvalues has N linearly independent eigenvectors, 
which therefore form a basis for the iV-dimensional vector space. If a general 
square matrix has degenerate eigenvalues, however, then it may or may not have 
N linearly independent eigenvectors. A matrix whose eigenvectors are not linearly 
independent is said to be defective. 


8.13.5 Simultaneous eigenvectors 

We may now ask under what conditions two different normal matrices can have 
a common set of eigenvectors. The result - that they do so if, and only if, they 
commute - has profound significance for the foundations of quantum mechanics. 

To prove this important result let A and B be two N x N normal matrices and 
x' be the ith eigenvector of A corresponding to eigenvalue f, i.e. 

Ax' = A,x' for i=l,2,...,N. 

For the present we assume that the eigenvalues are all different. 

(i) First suppose that A and B commute. Now consider 

ABx' = BAx' = B;.,x'' = AjBx‘, 
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where we have used the commutativity for the first equality and the eigenvector 
property for the second. It follows that A(Bx') = A,(Bx') and thus that Bx' is an 
eigenvector of A corresponding to eigenvalue A,-. But the eigenvector solutions of 
(A — A,l)x' = 0 are unique to within a scale factor, and we therefore conclude that 

Bx' = jUjX 1 

for some scale factor However, this is just an eigenvector equation for B and 
shows that x' is an eigenvector of B, in addition to being an eigenvector of A. By 
reversing the roles of A and B, it also follows that every eigenvector of B is an 
eigenvector of A. Thus the two sets of eigenvectors are identical. 

(ii) Now suppose that A and B have all their eigenvectors in common, a typical 
one x' satisfying both 

Ax' = A,x' and Bx' = /r,x'. 

As the eigenvectors span the N-dimensional vector space, any arbitrary vector x 
in the space can be written as a linear combination of the eigenvectors, 

N 

X = ]T C;X'. 
i= 1 

Now consider both 

N N N 

ABx = AB ^ c(X l = A ^ CtHiX 1 = CjA^jX 1 , 

i= 1 i=l i=l 


BAx = BA ^ CjX 1 = B ^ CilfX 1 = c,-/*,-/ l z x l . 

i= 1 i= 1 i=l 

It follows that ABx and BAx are the same for any arbitrary x and hence that 


(AB - BA)x = 0 


for all x. That is, A and B commute. 

This completes the proof that a necessary and sufficient condition for two 
normal matrices to have a set of eigenvectors in common is that they commute. 
It should be noted that if an eigenvalue of A, say, is degenerate then not all of 
its possible sets of eigenvectors will also constitute a set of eigenvectors of B. 
However, provided that by taking linear combinations one set of joint eigenvectors 
can be found, the proof is still valid and the result still holds. 

When extended to the case of Hermitian operators and continuous eigenfunc- 
tions (sections 17.2 and 17.3 the connection between commuting matrices and 
a set of common eigenvectors plays a fundamental role in the postulatory basis 
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of quatum mechanics. It draws the distinction between commuting and non- 
commuting observables and sets limits on how much information about a system 
can be known, even in principle, at any one time. 


8.14 Determination of eigenvalues and eigenvectors 

The next step is to show how the eigenvalues and eigenvectors of a given N x N 
matrix A are found. To do this we refer to (8.68) and as in (8.69) rewrite it as 

Ax — Alx = (A — Al)x = 0. (8.85) 

The slight rearrangement used here is to write x as lx, where I is the unit matrix 
of order N. The point of doing this is immediate since (8.85) now has the form 
of a homogeneous set of simultaneous equations, the theory of which will be 
developed in section 8.18. What will be proved there is that the equation Bx = 0 
only has a non-trivial solution x if |B| = 0. Correspondingly, therefore, we must 
have in the present case that 


| A — Al| = 0, 


( 8 . 86 ) 


if there are to be non-zero solutions x to (8.85). 

Equation (8.86) is known as the characteristic equation for A and its LHS as 
the characteristic or secular determinant of A. The equation is a polynomial of 
degree N in the quantity X. The N roots of this equation A,-, i — 1,2, ...,1V, give 
the eigenvalues of A. Corresponding to each A; there will be a column vector x', 
which is the ith eigenvector of A and can be found by using (8.68). 

It will be observed that when (8.86) is written out as a polynomial equation in 
X, the coefficient of — X N ~ { in the equation will be simply A u + A 2 2 + • • • + A NN 
relative to the coefficient of X N . As discussed in section 8.8, the quantity 
is the trace of A and, from the ordinary theory of polynomial equations, will be 
equal to the sum of the roots of (8.86): 

1 v 

Y b = Tr A. (8.87) 

j=i 

This can be used as one check that a computation of the eigenvalues A,- has been 
done correctly. Unless equation (8.87) is satisfied by a computed set of eigenvalues, 
they have not been calculated correctly. However, that equation (8.87) is satisfied is 
a necessary, but not sufficient, condition for a correct computation. An alternative 
proof of (8.87) is given in section 8.16. 
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Using (8.86), 

1-2 1 3 

1 1-2 -3 =0. 

3 -3 -3-2 

Expanding out this determinant gives 

(1 - 2) [(1 - 2)(— 3 - 2) - ( — 3 )( — 3)] + 1 [(— 3)(3) - l(-3 - 2)] 

+ 3 [1(— 3) — (1 — 2)(3)] = 0, 

which simplifies to give 

(1 - 2)(2 2 + 22 - 12) + (2 - 6) + 3(32 - 6) = 0, 

=> (2 — 2)(2 — 3)(2 + 6) = 0. 

Hence the roots of the characteristic equation, which are the eigenvalues of A, are ^ = 2, 
X 2 = 3, 23 = —6. We note that, as expected, 

2i + 22 + 23 = — 1 = 1 + 1 — 3 = An + A 22 + T33 = Tr A. 

For the first root, 2i = 2, a suitable eigenvector x 1 , with elements xi, X 2 , X3, must satisfy 
Ax 1 = 2x [ or, equivalently, 

Xl + X2 + 3.X3 = 2xi, 

xi + X2 — 3.X3 = 2.X2, ( 8 . 88 ) 

3xj — 3x2 — 3x3 = 2.X3. 

These three equations are consistent (to ensure this was the purpose in finding the particular 
values of 2) and yield X3 = 0, xi = X2 = k , where k is any non-zero number. A suitable 
eigenvector would thus be 

x 1 = (k k 0) T . 

If we apply the normalisation condition, we require k 2 + k 2 + 0 2 = 1 or k = 1 / Hence 

xl = (v5 75 °) = 7^ (1 1 0)T ' 

Repeating the last paragraph, but with the factor 2 on the RHS of (8.88) replaced 
successively by 22 = 3 and 23 = —6, gives two further normalised eigenvectors 


x2 = 73 (1 - 1 1)T ’ x3 = p (1 


-If . < 


In the above example, the three values of 2 are all different and A is a 
real symmetric matrix. Thus we expect, and it is easily checked, that the three 
eigenvectors are mutually orthogonal, i.e. 

(x 1 ) T x 2 = (x 1 ) T x 3 = (x 2 ) T x 3 = 0. 

It will be apparent also that, as expected, the normalisation of the eigenvectors 
has no effect on their orthogonality. 
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8.14.1 Degenerate eigenvalues 

We return now to the case of degenerate eigenvalues, i.e. those that have two or 
more associated eigenvectors. We have shown already that it is always possible 
to construct an orthogonal set of eigenvectors for a normal matrix, see subsec- 
tion 8.13.1, and the following example illustrates one method for constructing 
such a set. 



We first determine the eigenvalues using |A — Al| =0: 


0 = 


1-/1 0 3 

0 - 2-/1 0 
3 0 1 — A 


= —(1 — 1) 2 (2 + A) + 3(3 )(2 + 2) 
= (4 -/l)U + 2 ) 2 . 

Thus = 4, h = — 2 = A3. The eigenvector x 1 = (_xi x 2 X3) T is found from 


Xi ) 

1 ( Xl 

x 2 

= 4 x 2 

x 3 ) 

' \ *3 


1 0 3 
0-2 0 
3 0 1 


A general column vector that is orthogonal to x 1 is 

x = (a b — a) T , 

and it is easily shown that 

\ / \ 

a \ 


1 


1 


x 1 = — 0 

V2l 1 


/ 1 0 3 

Ax = [ 0 -2 0 
V 3 0 1 


b 1 = —2 
—a I 


= — 2x. 


(8.89) 


Thus x is a eigenvector of A with associated eigenvalue —2. It is clear, however, that there 
is an infinite set of eigenvectors x all possessing the required property; the geometrical 
analogue is that there are an infinite number of corresponding vectors x lying in the 
plane that has x 1 as its normal. We do require that the two remaining eigenvectors are 
orthogonal to one another, but this still leaves an infinite number of possibilities. For x 2 , 
therefore, let us choose a simple form of (8.89), suitably normalised, say, 

x 2 = (0 1 0) T . 

The third eigenvector is then specified (to within an arbitrary multiplicative constant) 
by the requirement that it must be orthogonal to x 1 and x 2 ; thus x 3 may be found by 
evaluating the vector product of x 1 and x 2 and normalising the result. This gives 

x 3 = —= ( — 1 0 1) T , 

v/2 

to complete the construction of an orthonormal set of eigenvectors. ◄ 
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8.15 Change of basis and similarity transformations 

Throughout this chapter we have considered the vector x as a geometrical quantity 
that is independent of any basis (or coordinate system). If we introduce a basis 
e,-, / = 1,2,..., N, into our iV-dimensional vector space then we may write 


x — xqei + x 2 e 2 + ■ ■ ■ + xjv e An 


and represent x in this basis by the column matrix 

x = (xi x 2 • • • x„) T , 


having components x,-. We now consider how these components change as a result 
of a prescribed change of basis. Let us introduce a new basis e', i = 1,2 ,...,N, 
which is related to the old basis by 

N 

e' / = ^5, ; e 1 , (8.90) 

1=1 


the coefficient S t j being the ith component of e'- with respect to the old (unprimed) 
basis. For an arbitrary vector x it follows that 

N N N N 

* = E x ‘ e ' = E x 'j e 'j = E x 'j E Si J ei - 

i= 1 7=1 7=1 i=l 

From this we derive the relationship between the components of x in the two 
coordinate systems as 


N 

Xi = s^Xj, 

j = i 


which we can write in matrix form as 


x = Sx' (8.91) 

where S is the transformation matrix associated with the change of basis. 

Furthermore, since the vectors e'j are linearly independent, the matrix S is 
non-singular and so possesses an inverse S' 1 . Multiplying (8.91) on the left by 
S' 1 we find 


x' = S^x, (8.92) 

which relates the components of x in the new basis to those in the old basis. 
Comparing (8.92) and (8.90) we note that the components of x transform inversely 
to the way in which the basis vectors e,- themselves transform. This has to be so, 
as the vector x itself must remain unchanged. 

We may also find the transformation law for the components of a linear 
operator under the same change of basis. Now, the operator equation y = A x 
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(which is basis independent) can be written as a matrix equation in each of the 
two bases as 

y = Ax, y' = A'x'. (8.93) 

But, using (8.91), we may rewrite the first equation as 

Sy' = ASx' => y' = S^ASx'. 

Comparing this with the second equation in (8.93) we find that the components 
of the linear operator A transform as 

A' = S^AS. (8.94) 

Equation (8.94) is an example of a similarity transformation - a transformation 
that can be particularly useful in converting matrices into convenient forms for 
computation. 

Given a square matrix A, we may interpret it as representing a linear operator 
A in a given basis e,. From (8.94), however, we may also consider the matrix 
A' = S _1 AS, for any non-singular matrix S, as representing the same linear 
operator A but in a new basis e'j, related to the old basis by 

e 'j = X s ‘i e ‘- 

i 

Therefore we would expect that any property of the matrix A that represents 
some (basis-independent) property of the linear operator A will also be shared 
by the matrix A'. We list these properties below. 

(i) If A = I then A' = I, since, from (8.94), 

A' = S -1 IS = S _1 S = I. (8.95) 

(ii) The value of the determinant is unchanged : 

| A' | = |S 1 AS| = |S _1 ||A||S| = | A| |S 1 1 |S| = |A||S -1 S| = |A|. (8.96) 

(iii) The characteristic determinant and hence the eigenvalues of A' are the 
same as those of A: from (8.86), 

| A' - ll| = IS- 1 AS - ll| = |S 1 (A - AI)S| 

= |S- 1 ||S||A-AI| = |A-AI|. (8.97) 

(iv) The value of the trace is unchanged: from (8.87), 

TrA' = = EED S_1 )^' 

i i j k 

= X X X - s " i<s h A Jk = X X 'V 1 /* = X A jj 

i j k j k j 

= TrA. (8.98) 
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An important class of similarity transformations is that for which S is a uni- 
tary matrix; in this case A' = S 1 AS = S : AS. Unitary transformation matrices 
are particularly important, for the following reason. If the original basis e, is 
orthonormal and the transformation matrix S is unitary then 

(e'l e'-> = ^E Sfa ' et |E S, v er ) 

' k r 

— /* ' 'y S r j(ek\s r ) 

k r 

= E E W* = E s u s v = ( sts )y = E 

k r k 

showing that the new basis is also orthonormal. 

Furthermore, in addition to the properties of general similarity transformations, 
for unitary transformations the following hold. 

(i) If A is Hermitian (anti-Hermitian) then A' is Hermitian (anti-Hermitian), 
i.e. if A 1 = +A then 

(A') t = (S t AS) t = SWS = +S t AS = +A'. (8.99) 

(ii) If A is unitary (so that A 1 = A -1 ) then A' is unitary, since 

(A') 1 ' A' = (S t AS) t (S t AS) = S^SS^S = S^AS 

= S t IS = l. (8.100) 


8.16 Diagonalisation of matrices 

Suppose that a linear operator A is represented in some basis e,-, i = 1, 2, . . . , N, 
by the matrix A. Consider a new basis x J given by 

N 

x ‘i — E! Sij£i, 

(=1 

where the x J are chosen to be the eigenvectors of the linear operator A , i.e. 

A x 7 = /i/xT (8.101) 

In the new basis, A is represented by the matrix A' = S _1 AS, which has a 
particularly simple form, as we shall see shortly. The element Sy of S is the ith 
component, in the old (unprimed) basis, of the yth eigenvector of A, i.e. the 
columns of S are the eigenvectors of the matrix A: 

( T T T \ 

S = x‘ x 2 ••• , 

VII I / 
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That is Sy = (x 7 ),-. Therefore A' is given by 


(S _1 AS)y = 

A- l 

“EDs- 1 )ikAkl(x j )l 

k 1 

k 

— T. l )ikSkj = Ajdij. 

k 

So the matrix A' is diagonal with the eigenvalues of ^4 as the diagonal elements, 
i.e. 


0 ••• 0 \ 

^2 : 

0 

0 1 N ) 

Therefore, given a matrix A, if we construct the matrix S that has the eigen- 
vectors of A as its columns then the matrix A' = S _1 AS is diagonal and has the 
eigenvalues of A as its diagonal elements. Since we require S to be non-singular 
(|S| ^ 0), the N eigenvectors of A must be linearly independent and form a basis 
for the IV-dimensional vector space. It may be shown that any matrix with distinct 
eigenvalues can be diagonalised by this procedure. If, however, a general square 
matrix has degenerate eigenvalues then it may, or may not, have N linearly 
independent eigenvectors. If it does not then it cannot be diagonalised. 

For normal matrices (which include Hermitian, anti-Hermitian and unitary 
matrices) the N eigenvectors are indeed linearly independent. Moreover, when 
normalised, these eigenvectors form an orthonormal set (or can be made to do 
so). Therefore the matrix S with these normalised eigenvectors as columns, i.e. 
whose elements are Sy = (x 7 ),-, has the property 

(s f s)y = y(s%(s) kJ = ys* ki s kj = ym^k = mV = % 

k k k 

Hence S is unitary (S _1 = S 1 ') and the original matrix A can be diagonalised by 

A' = S^AS = S ' AS. 

Therefore, any normal matrix A can be diagonalised by a similarity transformation 
using a unitary transformation matrix S. 


/ M 
0 

V 0 
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► Diagonalise the matrix 

( 1 

0 

3 \ 


A = 0 

-2 0 . 


V 3 

0 



The matrix A is symmetric and so may be diagonalised by a transformation of the form 
A' = S^AS, where S has the normalised eigenvectors of A as its columns. We have already 
found these eigenvectors in subsection 8.14.1, and so we can write straightaway 


S = 


1 

T* 


1 0 —1 \ 
o sj2 0 

10 1 J 


We note that although the eigenvalues of A are degenerate, its three eigenvectors are 
linearly independent and so A can still be diagonalised. Thus, calculating S^AS we obtain 

, / I 0 1 \ / 1 0 3 W 1 0 -1 

S f AS = - 0 sj2 0 0 -2 0 0 sj2 0 

2 \ -1 0 1 J \ 3 0 1/\1 0 1 

/ 4 0 0 \ 

= 0-2 0 , 

y 0 0 —2 J 


which is diagonal, as required, and has as its diagonal elements the eigenvalues of A. ◄ 


If a matrix A is diagonalised by the similarity transformation A' = S ! AS, so 
that A' = diag(Ai, A 2 . . . , A w ), then we have immediately 


N 

Tr A' = Tr A = ^ 

1=1 


( 8 . 102 ) 


N 

|A'| = |A|=n4 (8.103) 

/=i 

since the eigenvalues of the matrix are unchanged by the transformation. More- 
over, these results may be used to prove the rather useful trace formula 

| exp A| = exp(Tr A), (8.104) 

where the exponential of a matrix is as defined in (8.38). 


►Prore the trace formula (8.104). 

At the outset, we note that for the similarity transformation A' = S _1 AS, we have 
(A') n = (S~‘ AS)(S~‘ AS) ■ • • (S 1 AS) = S 'A"S. 

Thus, from (8.38), we obtain exp A' = S _1 (exp A)S, from which it follows that | exp A'| = 
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| exp A|. Moreover, by choosing the similarity transformation so that it diagonalises A, we 
have A' = diag(/t|, A 2 , . . . , /A), and so 

N 

| exp A| = | exp A'| = | exp[diag(2i, 2 . 2 , ■■ . , A^v)] I = |diag(exp7.i,exp/l2,.. . ,exp2iv)| = ]^[ exp l,-. 

i=l 

Rewriting the final product of exponentials of the eigenvalues as the exponential of the 
sum of the eigenvalues, we find 

N / N \ 

| exp A| = exp = exp I A; = exp(Tr A), 

.=1 V r=i / 

which gives the trace formula (8.104). ◄ 


8.17 Quadratic and Hermitian forms 

Let us now introduce the concept of quadratic forms (and their complex ana- 
logues, Hermitian forms). A quadratic form Q is a scalar function of a real vector 
x given by 

fi(x) = (x|^x), (8.105) 

for some real linear operator A . In any given basis (coordinate system) we can 
write (8.105) in matrix form as 

Q(x ) = x t Ax, (8.106) 

where A is a real matrix. In fact, as will be explained below, we need only consider 
the case where A is symmetric, i.e. A = A T . As an example in a three-dimensional 
space, 

Q = x t Ax = (xi x 2 * 3 ) ^ 1 1 —3 j ^ x 2 j 

= Xj + x\ — 3 x 3 + 2xix 2 + 6 x 1 X 3 — 6 x 2 X 3 - (8.107) 

It is reasonable to ask whether a quadratic form Q = x T Mx, where M is any 
(possibly non-symmetric) real square matrix, is a more general definition. That 
this not the case may be seen by expressing M in terms of a symmetric matrix 
A = ^(M + M t ) and an antisymmetric matrix B = ^(M — M T ) such that M = A + B. 
We then have 

Q = x t Mx = x t Ax + x t Bx. (8.108) 

However, Q is a scalar quantity and so 

Q = Q 1 = (x t Ax) t + (x t Bx) t = x t A t x + x t B t x = x T Ax - x T Bx. 

(8.109) 
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Comparing (8.108) and (8.109) shows that x T Bx = 0, and hence x T Mx = x T Ax, 
i.e. Q is unchanged by considering only the symmetric part of M. Hence, with no 
loss of generality, we may assume A = A T in (8.106). 

From its definition (8.105), Q is clearly a basis- (i.e. coordinate-) independent 
quantity. Let us therefore consider a new basis related to the old one by an 
orthogonal transformation matrix S, the components in the two bases of any 
vector x being related (as in (8.91)) by x = Sx' or, equivalently, by x' = S -1 x = 
S T x. We then have 

Q = x t Ax = (x') t S t ASx' = (x') t A'x', 

where (as expected) the matrix describing the linear operator A in the new 
basis is given by A' = S T AS (since S T = S^ 1 ). But, from the last section, if we 
choose as S the matrix whose columns are the normalised eigenvectors of A then 
A' = S t AS is diagonal with the eigenvalues of A as the diagonal elements. (Since 
A is symmetric, its normalised eigenvectors are orthogonal, or can be made so, 
and hence S is orthogonal with S _1 = S T .) 

In the new basis 

Q = x t Ax = (x') t Ax' = A\x'^ + A 2 x' 2 + ■ ■ ■ + 2j \/x' N 2 , (8.110) 

where A = diag(Ai,22,...,2jv) and the 2,- are the eigenvalues of A. It should be 
noted that Q contains no cross-terms of the form x\x' 2 - 


► Find an orthogonal transformation that takes the quadratic form (8.107) into the form 

AiX}~ T 2 2 x 2 ~ T 2 3 x 3 . 


The required transformation matrix S has the normalised eigenvectors of A as its columns. 
We have already found these in section 8.14, and so we can write immediately 


S = 


1 

V6 


V3 V2 1 \ 
V3 -s/2 -1 , 

0 V2 -2 J 


which is easily verified as being orthogonal. Since the eigenvalues of A are A = 2, 3, and 
—6, the general result already proved shows that the transformation x = Sx' will carry 
(8.107) into the form 2x , 2 + 3x' 2 2 — 6x 3 2 . This may be verified most easily by writing out 
the inverse transformation x' = S _1 x = S T x and substituting. The inverse equations are 

x\ = (xi + x 2 )/V2, 

x' 2 = (xi — x 2 + x 3 )/73, (8.111) 

x 3 = (xi — x 2 — 2 x 3 ) / -s/6- 

If these are substituted into the form Q = 2x', 2 + 3x^, 2 — 6x 3 2 then the original expression 
(8.107) is recovered. ◄ 


In the definition of Q it was assumed that the components xi, x 2 , x 2 and the 
matrix A were real. It is clear that in this case the quadratic form Q = x T Ax is real 
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also. Another, rather more general, expression that is also real is the Hermitian 
form 

H(x) = x t Ax, (8.112) 

where A is Hermitian (i.e. A' = A) and the components of x may now be complex. 
It is straightforward to show that H is real, since 

H* = (H T f = xWx = x f Ax = H. 

With suitable generalisation, the properties of quadratic forms apply also to Her- 
mitian forms, but to keep the presentation simple we will restrict our discussion 
to quadratic forms. 

A special case of a quadratic (Hermitian) form is one for which Q = x T Ax 
is greater than zero for all column matrices x. By choosing as the basis the 
eigenvectors of A we have Q in the form 

Q = A{XJ -j- 2-2^2 T 2 . 3 X 3 . 

The requirement that Q > 0 for all x means that all the eigenvalues A,- of A must 
be positive. A symmetric (Hermitian) matrix A with this property is called positive 
definite. If, instead, Q > 0 for all x then it is possible that some of the eigenvalues 
are zero, and A is called positive semi-definite. 


8.17.1 The stationary properties of the eigenvectors 

Consider a quadratic form, such as Q(x) = (x|_4x) given in (8.105), in a fixed 
basis. As the vector x is varied, through changes in its three components xi, X 2 
and X3, the value of the quantity Q also varies. Because of the homogeneous 
form of Q we may restrict any investigation of these variations to vectors of unit 
length (since multiplying any vector x by any scalar k simply multiplies the value 
of Q by a factor k 2 ). 

Of particular interest are any vectors x that make the value of the quadratic 
form a maximum or minimum. A necessary, but not sufficient, condition for this 
is that Q is stationary with respect to small variations Ax in x, whilst (x|x) is 
maintained at a constant value (unity). 

In the chosen basis the quadratic form is given by Q = x T Ax and, using 
Lagrange undetermined multipliers to incorporate the variational constraints, we 
are led to seek solutions of 

A[x t Ax — 2(x t x — 1)] = 0. (8.113) 

This may be used directly, together with the fact that (Ax T )Ax = x T A Ax, since A 
is symmetric, to obtain 

Ax = Ax, (8.114) 
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as the necessary condition that x must satisfy. If (8.114) is satisfied for some 
eigenvector x then the value of Q(x) is given by 

Q = x t Ax = x T xx = L (8.115) 

However, if x and y are eigenvectors corresponding to different eigenvalues then 
they are (or can be chosen to be) orthogonal. Consequently the expression y T Ax 
is necessarily zero, since 


y T Ax = y T Ax = ly T x = 0. 


(8.116) 


Summarising, those column matrices x of unit magnitude that make the 
quadratic form Q stationary are eigenvectors of the matrix A, and the stationary 
value of Q is then equal to the corresponding eigenvalue. It is straightforward 
to see from the proof of (8.114) that, conversely, any eigenvector of A makes Q 
stationary. 

Instead of maximising or minimising Q = x T Ax subject to the constraint 
x T x = 1, an equivalent procedure is to extremise the function 


A(x) = 


x 1 Ax 


► S/iow that if A(x) is stationary then x is an eigenvector of A and 2(x) is equal to the 
corresponding eigenvalue. 

We require A2(x) = 0 with respect to small variations in x. Now 

A2 = — A-j [(x T x) (Ax t Ax + x T A Ax) - x T Ax (Ax T x + x T Ax)] 

x T Ax\ Ax T x 
x T x ) x T x ’ 

since x T AAx = (Ax T )Ax and x T Ax = (Ax T )x. Thus 

M = AAax t [Ax-1(x)x], 

Hence, if AX = 0 then Ax = 2(x)x, i.e. x is an eigenvector of A with eigenvalue /fix). ◄ 


2Ax T Ax 


— 2 


Thus the eigenvalues of a symmetric matrix A are the values of the function 

. x t Ax 
/l(x) = 


x‘x 


at its stationary points. The eigenvectors of A lie along those directions in space 
for which the quadratic form Q = x T Ax has stationary values, given a fixed 
magnitude for the vector x. Similar results hold for Hermitian matrices. 
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8.17.2 Quadratic surfaces 

The results of the previous subsection may be turned round to state that the 
surface given by 

x t Ax = constant = 1 (say) (8.117) 

and called a quadratic surface, has stationary values of its radius (i.e. origin- 
surface distance) in those directions that are along the eigenvectors of A. More 
specifically, in three dimensions the quadratic surface x T Ax = 1 has its principal 
axes along the three mutually perpendicular eigenvectors of A, and the squares 
of the corresponding principal radii are given by Af 1 , i = 1,2,3. As well as 
having this stationary property of the radius, a principal axis is characterised by 
the fact that any section of the surface perpendicular to it has some degree of 
symmetry about it. If the eigenvalues corresponding to any two principal axes are 
degenerate then the quadratic surface has rotational symmetry about the third 
principal axis and the choice of a pair of axes perpendicular to that axis is not 
uniquely defined. 


>-Find the shape of the quadratic surface 

x\ + x\ — 3 x 3 + 2xiX2 + 6x1X3 — 6x2X3 = 1 . 


If, instead of expressing the quadratic surface in terms of xi, X2, X3, as in (8.107), we 
were to use the new variables x\, x' 2 , x' 3 defined in (8.111), for which the coordinate axes 
are along the three mutually perpendicular eigenvector directions (1,1,0), (1,— 1,1) and 
(1,-1,— 2), then the equation of the surface would take the form (see (8.110)) 

il t 2 ,2 

*1 + _*2 *3 _ J 

(l/x/2) 2 (1/V3) 2 (1/V6) 2 

Thus, for example, a section of the quadratic surface in the plane x' 3 = 0, i.e. xi — X 2 — 
2 x 3 = 0, is an ellipse, with semi-axes 1/^/2 and 1/^/3. Similarly a section in the plane 
x\ = xi + x 2 = 0 is a hyperbola. ◄ 

Clearly the simplest three-dimensional situation to visualise is that in which all 
the eigenvalues are positive, since then the quadratic surface is an ellipsoid. 


8.18 Simultaneous linear equations 

In physical applications we often encounter sets of simultaneous linear equations. 
In general we may have M equations in N unknowns xi , X2 , .. .,xn of the form 


d-llXi + A 12 X 2 + • 

■ ■ + AinXn 

= hi, 

A 21 XI + A 22 X 2 + • 

■ ■ + A 2 NXN 

= b 2 , 

AmIXi + AmiX2 H — 

■ + AmnXn 

= b M 
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where the Ay and b t have known values. If all the b t are zero then the system of 
equations is called homogeneous , otherwise it is inhomogeneous. Depending on the 
given values, this set of equations for the N unknowns x\, x 2 , ■ ■■, xn may have 
either a unique solution, no solution or infinitely many solutions. Matrix analysis 
may be used to distinguish between the possibilities. The set of equations may be 
expressed as a single matrix equation Ax = b, or, written out in full, as 


An 

^12 

A 21 

^22 

Ami 

A M2 


A\n \ 

Ain 

Amn ) 


( X ' \ 

X2 

\ x N ) 


( h i \ 

b 2 

\ b M ) 


8.18.1 The range and null space of a matrix 

As we discussed in section 8.2, we may interpret the matrix equation Ax = b as 
representing, in some basis, the linear transformation A x = b of a vector x in an 
iV-dimensional vector space V into a vector b in some other (in general different) 
M-dimensional vector space W. 

In general the operator A will map any vector in V into some particular 
subspace of W, which may be the entire space. This subspace is called the range 
of A (or A) and its dimension is equal to the rank of A. Moreover, if A (and 
hence A) is singular then there exists some subspace of V that is mapped onto 
the zero vector 0 in W ; that is, any vector y that lies in the subspace satisfies 
Ay = 0. This subspace is called the null space of A and the dimension of this 
null space is called the nullity of A. We note that the matrix A must be singular 
if M f N and may be singular even if M = N. 

The dimensions of the range and the null space of a matrix are related through 
the fundamental relationship 

rank A + nullity A = N, (8.119) 

where N is the number of original unknowns xi,x 2 ,...,xn- 


>-Prove the relationship (8.119). 


As discussed in section 8.11, if the columns of an M x IV matrix A are interpreted as the 
components, in a given basis, of N (M-component) vectors Vi,v 2 ,...,Vjv then rank A is 
equal to the number of linearly independent vectors in this set (this number is also equal 
to the dimension of the vector space spanned by these vectors). Writing (8.118) in terms 
of the vectors vi,V2,...,Vjv, we have 


xivi T x 2 v 2 + • • ■ + xn\n — b . (8.120) 

From this expression, we immediately deduce that the range of A is merely the span of 
the vectors vi,V 2 ,...,Vjv and hence has dimension r = rank A. 
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If a vector y lies in the null space of A then A y = 0, which we may write as 

yivi+y 2 v 2 + 1- LatVjv = 0. (8.121) 

As just shown above, however, only r (< N) of these vectors are linearly independent. By 
renumbering, if necessary, we may assume that vi,v 2 ,...,v r form a linearly independent 
set; the remaining vectors, v r+ i,v r+ 2 ,...,vjv, can then be written as a linear superposition 
of vi,v 2 ,.,,,v We are therefore free to choose the N — r coefficients y r+1 ,v r+2 ,...,y N 
arbitrarily and (8.121) will still be satisfied for some set of r coefficients >'i , y 2 , . . . , j.y (which 
are not all zero). The dimension of the null space is therefore N — r, and this completes 
the proof of (8.119). ◄ 

Equation (8.119) has far-reaching consequences for the existence of solutions 
to sets of simultaneous linear equations such as (8.118). As mentioned previously, 
these equations may have no solution , a unique solution or infinitely many solutions. 
We now discuss these three cases in turn. 

No solution 

The system of equations possesses no solution unless b lies in the range of A ; in 
this case (8.120) will be satisfied for some xi,X 2 ,...,xj v- This in turn requires the 
set of vectors b, vi,V 2 ,...,vjv to have the same span (see (8.8)) as Vi,V 2 In 
terms of matrices, this is equivalent to the requirement that the matrix A and the 
augmented matrix 


^11 

An 

A\n 

h \ 

A 21 

A 22 

A-2N 

b\ 

Ami 

Ami ■ 

Amn 

I’m ) 


have the same rank r. If this condition is satisfied then b does lie in the range of 
A, and the set of equations (8.118) will have either a unique solution or infinitely 
many solutions. If, however, A and M have different ranks then there will be no 
solution. 


A unique solution 

If b lies in the range of A and if r = N then all the vectors vi, V 2 , . . -An in (8.120) 
are linearly independent and the equation has a unique solution xi,x 2 , . ..,xjy. 

Infinitely many solutions 

If b lies in the range of A and if r < N then only r of the vectors vi,V2,...,Vjv 
in (8.120) are linearly independent. We may therefore choose the coefficients of 
n — r vectors in an arbitrary way, while still satisfying (8.120) for some set of 
coefficients xi,X 2 ,...,xjv- There are therefore infinitely many solutions, which span 
an ( n — r) dimensional vector space. We may also consider this space of solutions 
in terms of the null space of A: if x is some vector satisfying _4x = b and y is 
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any vector in the null space of A (i.e. A y = 0) then 

.4(x + y) = .4x + .4y = .4x + 0 = b, 

and so x + y is also a solution. Since the null space is (n — r)-dimensional, so too 
is the space of solutions. 

We may use the above results to investigate the special case of the solution of 
a homogeneous set of linear equations, for which b = 0. Clearly the set always has 
the trivial solution xi = x 2 = • • • = x„ = 0, and if r = N this will be the only 
solution. If r < N, however, there are infinitely many solutions; they form the 
null space of A, which has dimension n — r. In particular, we note that if M < N 
(i.e. there are fewer equations than unknowns) then r < N automatically. Hence a 
set of homogeneous linear equations with fewer equations than unknowns always 
has infinitely many solutions. 


8.18.2 N simultaneous linear equations in N unknowns 

A special case of (8.118) occurs when M = N. In this case the matrix A is square 
and we have the same number of equations as unknowns. Since A is square, the 
condition r = N corresponds to |A| =^0 and the matrix A is non-singular. The 
case r < N corresponds to |A| = 0, in which case A is singular. 

As mentioned above, the equations will have a solution provided b lies in the 
range of A. If this is true then the equations will possess a unique solution when 
|A| =f= 0 or infinitely many solutions when |A| =0. There exist several methods 
for obtaining the solution(s). Perhaps the most elementary method is Gaussian 
elimination ; this method is discussed in subsection 28.3.1, where we also address 
numerical subtleties such as equation interchange (pivoting). In this subsection, 
we will outline three further methods for solving a square set of simultaneous 
linear equations. 


Direct inversion 

Since A is square it will possess an inverse, provided |A| ^ 0. Thus, if A is 
non-singular, we immediately obtain 

x = A _1 b (8.122) 

as the unique solution to the set of equations. However, if b = 0, then we see 
immediately that the set of equations possesses only the trivial solution x = 0. The 
direct inversion method has the advantage that, once A -1 has been calculated, 
one may obtain the solutions x corresponding to different vectors bi, b 2 , ... on 
the RHS, with little further work. 
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► S/jow that the set of simultaneous equations 

2 Xl + 4x2 + 3x3 = 4, 


Xi — 2x 2 — 2x3 = 0, 

— 3xi + 3x2 + 2 x 3 = —7, 

has a unique solution, and find that solution. 

(8.123) 


The simultaneous equations can be represented by the matrix equation Ax = b, i.e. 



As we have already shown that A 1 exists and have calculated it, see (8.59), it follows that 
x = A _1 b or, more explicitly, that 



4 

0 

-1 


Thus the unique solution is xi = 2, X 2 = —3, xj = 4. ◄ 



(8.124) 


LU decomposition 

Although conceptually simple, finding the solution by calculating A^ 1 can be 
computationally demanding, especially when N is large. In fact, as we shall now 
show, it is not necessary to perform the full inversion of A in order to solve the 
simultaneous equations Ax = b. Rather, we can perform a decomposition of the 
matrix into the product of a square lower triangular matrix L and a square upper 
triangular matrix U, which are such that 

A = LU, (8.125) 

and then use the fact that triangular systems of equations can be solved very 
simply. 

We must begin, therefore, by finding the matrices L and U such that (8.125) 
is satisfied. This may be achieved straightforwardly by writing out (8.125) in 
component form. For illustration, let us consider the 3x3 case. It is, in fact, 
always possible, and convenient, to take the diagonal elements of L as unity, so 
we have 


/ 1 

0 


( U n 

U 12 

u 13 \ 

A = I L21 

1 

0 

° 

U22 

U23 

\ F31 

T32 

1 / 

V 0 

0 

t/33 ) 


/ U n U 12 U i 3 \ 

= I L 2l Uu L 2 iUi 2 + U 2 2 F21U13 + I/23 I ( 8 . 126 ) 

\ L31U11 L31U12 + -L32U22 T31 Ui 3 + L32U23 + U33 ) 

The nine unknown elements of L and U can now be determined by equating 
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the nine elements of ( 8 . 126 ) to those of the 3 x 3 matrix A. This is done in the 
particular order illustrated in the example below. 

Once the matrices L and U have been determined, one can use the decomposition 
to solve the set of equations Ax = b in the following way. From ( 8 . 125 ), we have 

LUx = b, 


but this can be written as two triangular sets of equations 

Ly = b and Ux = y. 


where y is another column matrix to be determined. One may easily solve the first 
triangular set of equations for y, which is then substituted into the second set. 
The required solution x is then obtained readily from the second triangular set 
of equations. We note that, as with direct inversion, once the LU decomposition 
has been determined, one can solve for various RHS column matrices bi, b2, ... , 
with little extra work. 


► Use LU decomposition to solve the set of simultaneous equations (8.123). 


We begin the determination of the matrices L and U by equating the elements of the 
matrix in ( 8 . 126 ) with those of the matrix 

( 2 4 3 

A = 1 -2-2 

y — 3 3 2 

This is performed in the following order: 


1st row: Un = 2, 

1st column: L 2 iUu = 1, 

2nd row: U21U12 + U22 = —2 

2 nd column: L 2l Uu + L32U22 = 3 
3 rd row: U31 U13 T U32U23 + U33 = 2 

Thus we may write the matrix A as 



We must now solve the set of equations 


U 12 = 4 , U 13 = 3 

L31U11 = — 3 => L 2 i = 5, U 31 = —1 

E21U13 + U23 = — 2 => U22 = — 4 , U23 = — \ 



= b, which read 



Since this set of equations is triangular, we quickly find 

D= 4 , V2 = 0 — (5 )( 4 ) = — 2 , T 3 = — 7 — (— §)( 4 ) — ( — §)( — 2 ) = — y. 
These values must then be substituted into the equations Ux = y, which read 
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This set of equations is also triangular, and we easily find the solution 

xi = 2, X2 = —3, X} = 4, 

which agrees with the result found above by direct inversion. ◄ 

We note, in passing, that one can calculate both the inverse and the determinant 
of A from its LU decomposition. To find the inverse A -1 , one solves the system 
of equations Ax = b repeatedly for the N different RHS column matrices b = e ; 
( i = 1,2, ...,1V), where e, is the column matrix with its ith element equal to unity 
and the others equal to zero. The solution x in each case gives the corresponding 
column of A -1 . Evaluation of the determinant |A| is much simpler. From (8.125), 
we have 


|A| = I LU | = | L| | U | . (8.127) 

Since L and U are triangular, however, we see from (8.64) that their determinants 
are equal to the products of their diagonal elements. Since L ti = 1 for all i, we 
thus find 

1 v 

|A| = UuU22---U NN = Y[Un. 

i= 1 

As an illustration, in the above example we find |A| = (2)( — 4)( — 11/8) = 11, 
which, as it must, agrees with our earlier calculation (8.58). 

Finally, we note that if the matrix A is symmetric and positive semi-definite 
then we can decompose it as 

A = LL t , (8.128) 

where L is a lower triangular matrix whose diagonal elements are not , in general, 
equal to unity. This is known as a Cholesky decomposition (in the special case 
where A is real, the decomposition becomes A = LL T ). The reason that we cannot 
set the diagonal elements of L equal to unity in this case is that we require the 
same number of independent elements in L as in A. The requirement that the 
matrix be positive semi-definite is easily derived by considering the Hermitian 
form (or quadratic form in the real case) 

x^Ax = x t LL t x = (L 1 x) t (L 1 x). 

Denoting the column matrix L^x by y, we see that the last term on the RFIS 
is y'fy, which must be greater than or equal to zero. Thus, we require x^Ax > 0 
for any arbitrary column matrix x, and so A must be positive semi-definite (see 
section 8.17). 

We recall that the requirement that a matrix be positive semi-definite is equiv- 
alent to demanding that all the eigenvalues of A are positive or zero. If one of 
the eigenvalues of A is zero, however, then from (8.103) we have |A| = 0 and so A 
is singular. Thus, if A is a non-singular matrix, it must be positive definite (rather 
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than just positive semi-definite) in order to perform the Cholesky decomposition 
(8.128). In fact, in this case, the inability to find a matrix L that satisfies (8.128) 
implies that A cannot be positive definite. 

The Cholesky decomposition can be applied in an analogous way to the LU 
decomposition discussed above, but we shall not explore it further. 


Cramer’s rule 

An alternative method of solution is to use Cramer’s rule, which also provides 
some insight into the nature of the solutions in the various cases. To illustrate 
this method let us consider a set of three equations in three unknowns, 

AnXi + A 12 X 2 + A 13 X 3 = b\, 

A21X1 + A22X2 +A23X3 = b 2 , ( 8 . 129 ) 

A31X1 + A32X2 + A33X3 = b^, 


which may be represented by the matrix equation Ax = b. We wish either to find 
the solution! s) x to these equations or to establish that there are no solutions. 
From result (vi) of subsection 8.9.1, the determinant |A| is unchanged by adding 
to its first column the combination 

X 2 X 3 

— x (second column of |A|) + — x (third column of |A|). 

Xi Xi 

We thus obtain 



A n 

^ 4 l 2 

A13 


An + (.X2/xi)Mi2 + (,X3/xi)Mi3 

A\2 

Al 3 

|A| = 

A21 

A22 

A23 

= 

^21 + (X2/2C1M22 + (-^ 3 /^lM 23 

A22 

A23 


A31 

A32 

A33 


A31 + (X 2 /xi)A 32 + (-X3/ Xi)yl33 

A32 

A33 


which, on substituting bj/x 1 for the ith entry in the first column, yields 


I A| = 


1 

Xi 


b 1 A l2 A u 

b 2 A 22 A 23 
bi A 32 A 33 


1 

Xi 


An 


The determinant Ai is known as a Cramer determinant. Similar manipulations of 
the second and third columns of |A| yield X2 and X3, and so the full set of results 
reads 


where 


_ Ai _ A 2 _ A 3 

“ jAf X2 “ jAp X3 “ jAf’ 


(8.130) 



b\ 

An 

^13 


An 

b\ 

A[3 


An 

An 

b\ 

Ai = 

b 2 

A 22 

A 23 

» a 2 = 

A 21 

b 2 

A 23 

, A 3 = 

A 21 

A 22 

b 2 


b3 

A 32 

^33 


A 31 

b3 

A 33 


^31 

A 32 

b3 


It can be seen that each Cramer determinant A, is simply |A| but with column i 
replaced by the RHS of the original set of equations. If |A| =£ 0 then (8.130) gives 
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the unique solution. The proof given here appears to fail if any of the solutions 
Xi is zero, but it can be shown that result (8.130) is valid even in such a case. 


► Use Cramer’s rule to solve the set of simultaneous equations (8.123). 


Let us again represent these simultaneous equations by the matrix equation Ax = b, i.e. 



From (8.58), the determinant of A is given by |A| = 11. Following the discussion given 
above, the three Cramer determinants are 



4 4 3 


2 4 3 


2 4 4 

II 

< 

0 -2 -2 
-7 3 2 

, ^2 = 

1 0 -2 
-3 -7 2 

II 

< 

1 -2 0 
-3 3 -7 


These may be evaluated using the properties of determinants listed in subsection 8.9.1 
and we find = 22, A 2 = —33 and A 3 = 44. From (8.130) the solution to the equations 
(8.123) is given by 


Xl 



-33 

* 2 = TT = -3’ 


*3 = 


44 

IT 


= 4, 


which agrees with the solution found in the previous example. ◄ 


At this point it is useful to consider each of the three equations (8.129) as rep- 
resenting a plane in three-dimensional Cartesian coordinates. Using result (7.42) 
of chapter 7, the sets of components of the vectors normal to the planes are 
(An, A u , A u ), (A 2 1 , A 22 , A 23 ) and (A 3U A 32 , A 33 ), and using (7.46) the perpendic- 
ular distances of the planes from the origin are given by 


dj = 


If 

( 4 + 4 + 4 )'" 


for i = 1,2, 3. 


Finding the solution(s) to the simultaneous equations above corresponds to finding 
the point(s) of intersection of the planes. 

If there is a unique solution the planes intersect at only a single point. This 
happens if their normals are linearly independent vectors. Since the rows of A 
represent the directions of these normals, this requirement is equivalent to |A| f 0. 
If b = (0 0 0) T = 0 then all the planes pass through the origin and, since there 
is only a single solution to the equations, the origin is that solution. 

Let us now turn to the cases where |A| =0. The simplest such case is that in 
which all three planes are parallel; this implies that the normals are all parallel 
and so A is of rank 1. Two possibilities exist: 


(i) the planes are coincident, i.e. di = d 2 = d 3 , in which case there is an 
infinity of solutions; 

(ii) the planes are not all coincident, i.e. d i f d 2 and/or d\ =/= d 3 and/or 
d 2 f d 3 , in which case there are no solutions. 
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(a) ( b ) 


Figure 8.1 The two possible cases when A is of rank 2. In both cases all the 
normals lie in a horizontal plane but in («) the planes all intersect on a single 
line (corresponding to an infinite number of solutions) whilst in ( b ) there are 
no common intersection points (no solutions). 


It is apparent from (8.130) that case (i) occurs when all the Cramer determinants 
are zero and case (ii) occurs when at least one Cramer determinant is non-zero. 

The most complicated cases with |A| = 0 are those in which the normals to the 
planes themselves lie in a plane but are not parallel. In this case A has rank 2. 
Again two possibilities exist and these are shown in figure 8.1. Just as in the 
rank-1 case, if all the Cramer determinants are zero then we get an infinity of 
solutions (this time on a line). Of course, in the special case in which b = 0 (and 
the system of equations is homogeneous), the planes all pass through the origin 
and so they must intersect on a line through it. If at least one of the Cramer 
determinants is non-zero, we get no solution. 

These rules may be summarised as follows. 

(i) |A| ^ 0, b 0: The three planes intersect at a single point that is not the 
origin, and so there is only one solution, given by both (8.122) and (8.130). 

(ii) |A| 0, b = 0: The three planes intersect at the origin only and there is 

only the trivial solution, x = 0. 

(iii) |A| = 0, b ^ 0, Cramer determinants all zero: There is an infinity of 
solutions either on a line if A is rank 2, i.e. the cofactors are not all zero, 
or on a plane if A is rank 1, i.e. the cofactors are all zero. 

(iv) |A| =0, b ^ 0, Cramer determinants not all zero: No solutions. 

(v) |A| = 0, b = 0: The three planes intersect on a line through the origin 
giving an infinity of solutions. 


8.18.3 Singular value decomposition 

There exists a very powerful technique for dealing with a simultaneous set of 
linear equations Ax = b, such as (8.118), which may be applied whether or not 
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the number of simultaneous equations M is equal to the number of unknowns N. 
This technique is known as singular value decomposition (SVD) and is the method 
of choice in analysing any set of simultaneous linear equations. 

We will consider the general case, in which A is an M x N (complex) matrix. 
Let us suppose we can write A as the product§ 

A = USV t , (8.131) 

where the matrices U, S and V have the following properties. 

(i) The square matrix U has dimensions M x M and is unitary. 

(ii) The matrix S has dimensions M x N (the same dimensions as those of A) 
and is diagonal in the sense that Sy = 0 if i j=- j. We denote its diagonal 
elements by s,- for i = 1,2 where p = min (M,N); these elements are 
termed the singular values of A. 

(iii) The square matrix V has dimensions N x N and is unitary. 

We must now determine the elements of these matrices in terms of the elements of 
A. From the matrix A, we can construct two square matrices: A^A with dimensions 
NxN and AA 1 ^ with dimensions M xM. Both are clearly Hermitian. From (8.131), 
and using the fact that U and V are unitary, we find 

A f A = VS t U t USV t = VS t SV t (8.132) 

AA r = USV t VS t U t = USS t U t , (8.133) 

where S^S and SS^ are diagonal matrices with dimensions NxN and M x M 
respectively. The first p elements of each diagonal matrix are s?, i — 1,2 ,...,p, 
where p = min(M, N), and the rest (where they exist) are zero. 

These two equations imply that both V^'A^AV (= V~ 1 A 1 'A(V 1 ') -1 ) and, by 
a similar argument, LU'AAiU, must be diagonal. From our discussion of the 
diagonalisation of Hermitian matrices in section 8.16, we see that the columns of 
V must therefore be the normalised eigenvectors v', i = 1,2, ...,N, of the matrix 
A 1 " A and the columns of U must be the normalised eigenvectors u j , j = 1,2,..., M, 
of the matrix A A' 1 '. Moreover, the singular values s,- must satisfy s? = A;, where 
the A,- are the eigenvalues of the smaller of A 1 ' A and AA ; . Clearly, the A; are 
also some of the eigenvalues of the larger of these two matrices, the remaining 
ones being equal to zero. Since each matrix is Hermitian, the A, are real and the 
singular values s, may be taken as real and non-negative. Finally, to make the 
decomposition (8.131) unique, it is customary to arrange the singular values in 
decreasing order of their values, so that si > S 2 > ■ ■ ■ > s p . 


§ The proof that such a decomposition always exists is beyond the scope of this book. For a full 
account of SVD one might consult, for example, Golub & Van Loan, Matrix Computations, second 
edition (Johns Hopkins University Press). 
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► S/iow that, for i = 1,2, Av' = s,u' and /fu' = SjV 1 , where p = min (M, IV). 

Post-multiplying both sides of (8.131) by V, and using the fact that V is unitary, we obtain 

AV = US. 

Since the columns of V and U consist of the vectors v' and id respectively and S has only 
diagonal non-zero elements, we find immediately that, for i = 1,2,..., p, 

Av' = SjU*. (8.134) 

Moreover, we note that Av' = 0 for i = p + 1, p + 2, . . . , N. 

Taking the Hermitian conjugate of both sides of (8.131) and post-multiplying by U, we 
obtain 

A T U = VSt = VS T , 

where we have used the fact that U is unitary and S is real. We then see immediately that, 
for i = 1, 2, . . . ,p, 

AV = SjV'. (8.135) 

We also note that A t u' = 0 for i = p+l,p + 2,...,M. Results (8.134) and (8.135) are useful 
for investigating the properties of the SVD. ◄ 

The decomposition (8.131) has some advantageous features for the analysis of 
sets of simultaneous linear equations. These are best illustrated by writing the 
decomposition (8.131) in terms of the vectors u' and v' as 

A = s < u '( vi ) t > 

i= 1 

where p = min (M,N). It may be, however, that some of the singular values s,- 
are zero, as a result of degeneracies in the set of M linear equations Ax = b. 
Let us suppose that there are r non-zero singular values. Since our convention is 
to arrange the singular values in order of decreasing size, the non-zero singular 
values are s,-, i = 1,2 ,...,r, and the zero singular values are s r+ i,s r+ 2 , ...,s p . 
Therefore we can write A as 

r 

A = ^s,u i (v') t . (8.136) 

i= 1 

Let us consider the action of (8.136) on an arbitrary vector x. This is given by 

r 

Ax = ^ SjU^v'^x. 

i= 1 

Since (v 1 ) 1 * x is just a number, we see immediately that the vectors u 1 , i= 1, 2, . . . , r, 
must span the range of the matrix A; moreover, these vectors form an orthonor- 
mal basis for the range. Further, since this subspace is r-dimensional, we have 
rank A = r, i.e. the rank of A is equal to the number of non-zero singular values. 

The SVD is also useful in characterising the null space of A. From (8.119), 
we already know that the null space must have dimension N — r; so if A has r 
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non-zero singular values s- h i = 1,2 then from the worked example above 
we have 

Av' =0 for i = r + 1, r + 2, . . . , N. 

Thus, the N — r vectors v', i = r + 1, r + 2, . . . , N, form an orthonormal basis for 
the null space of A. 


► Find the singular value decompostion 

of the matrix 


( 2 

2 2 2 \ 


A = J 15 

1 17 1 | 

10 10 10 • 

(8.137) 

l 3 

9 3 9 / 


\ 5 

5 5 5 / 



The matrix A has dimension 3x4 (i.e. M = 3, N = 4), and so we may construct from 
it the 3x3 matrix AA + and the 4x4 matrix A^A (in fact, since A is real, the Hermitian 
conjugates are just transposes). We begin by finding the eigenvalues 1, and eigenvectors if 
of the smaller matrix AAh This matrix is easily found to be given by 


AA t = 


and its characteristic equation reads 


0 

0 \ 

29 

12 1 

5 

5 

12 

36 

5 

5 / 


16-/1 

0 

0 


0 0 

29 j 12 

1 "■ T 

12 36 i 

5 5 / 


= (16 — 2)(36 — 132 + A 2 ) = 0. 


Thus, the eigenvalues are A 3 = 16, A 2 = 9, = 4. Since the singular values of A are given 

by Si = and the matrix S in (8.131) has the same dimensions as A, we have 


/ 4 0 0 0 \ 

S = 0 3 0 0 , 

y 0 0 2 0 J 


(8.138) 


where we have arranged the singular values in order of decreasing size. Now the matrix U 
has as its columns the normalised eigenvectors if of the 3x3 matrix AA + . These normalised 
eigenvectors correspond to the eigenvalues of AA 1 ’ as follows: 


A i = 16 

A 2 = 9 
A 3 =4 

and so we obtain the matrix 


u 1 = (1 0 0) T 

u 2 = (0 | i) T 
u 3 = (0 f) T , 


u = 


1 

0 

0 


0 

3 

5 


0 


4 



5 


(8.139) 


The columns of the matrix V in (8.131) 
matrix A^A, which is given by 


are the normalised eigenvectors of the 4x4 


A f A = 


1 

4 


29 

21 

3 

11 

21 

29 

11 

3 

3 

11 

29 

21 

11 

3 

21 

29 
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We already know from the above discussion, however, that the non-zero eigenvalues of 
this matrix are equal to those of AA^ found above, and that the remaining eigenvalue is 
zero. The corresponding normalised eigenvectors are easily found : 


Ai 

= 16 

=> 

v 1 

= 5(1 

1 

^2 

= 9 

=> 

v 2 

= 5(1 

1 

h 

= 4 

=> 

v 3 

= H- 

1 

A4 

= 0 

=> 

v 4 

= 5(1 

- 

given by 









1 

-1 


V = 

1 

f 1 

1 

1 


2 

; 1 

-1 

1 




1 1 

-1 

-1 


1 1) T 

1 1 - 1) T 
-1 1 -1) T 


(8.140) 


Alternatively, we could have found the first three columns of V by using the relation 
(8.135) to obtain 

v' = — A^u' fori = 1,2,3. 

Si 


The fourth eigenvector could then be found using the Gram-Schmidt orthogonalisation 
procedure. We note that if there were more than one eigenvector corresponding to a zero 
eigenvalue then we would need to use this procedure to orthogonalise these eigenvectors 
before constructing the matrix V. 

Collecting our results together, we find the SVD of the matrix A: 


A = USV t 


0 0 
3 4 

I 3 5 

5 5 


4 0 0 0 
0 3 0 0 
0 0 2 0 


this can be verified by direct multiplication. ◄ 


( \ 

1 

2 

_ 1 
2 

V \ 


1 

2 

1 

2 

1 

2 

_ 1 
2 


1 

2 

_ 1 

2 

1 

2 

1 

2 



2 

1 

2 


-\i 


Let us now consider the use of SVD in solving a set of M simultaneous linear 
equations in N unknowns, which we write again as Ax = b. Firstly, consider 
the solution of a homogeneous set of equations, for which b = 0. As mentioned 
previously, if A is square and non-singular (and so possesses no zero singular 
values) then the equations have the unique trivial solution x = 0. Otherwise, any 
of the vectors v', i = r + 1, r + 2, . . . , N, or any linear combination of them, will 
be a solution. 

In the inhomogeneous case, where b is not a zero vector, the set of equations 
will possess solutions if b lies in the range of A. To investigate these solutions, it 
is convenient to introduce the N x M matrix S, which is constructed by taking 
the transpose of S in (8.131) and replacing each non-zero singular value s,- on the 
diagonal by 1/s,-. It is clear that, with this construction, 


SS = I. (8.141) 

We note, however, that the matrix S is not the inverse of S since SS ^ I. 
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Nevertheless, using property (8.141) and the unitarity of the matrices U and V, a 
solution to the equations Ax = b is given by 

x = VSU f b. (8.142) 

We may, however, add to this solution any linear combination of the p — r vectors 
v ! , i = r + 1, r + 2, . . . , p, that form an orthonormal basis for the null space of A ; 
thus, in general, there exists an infinity of solutions (although it is straightforward 
to show that (8.142) is the solution vector of shortest length). The only way in 
which the solution (8.142) can be unique is if the rank r equals N, so that the 
matrix A does not possess a null space; this only occurs if A is square and 
non-singular. 

If b does not lie in the range of A then the set of equations Ax = b does 
not have a solution. Nevertheless, the vector (8.142) provides the closest possible 
‘solution’ in a least-squares sense. In other words, although the vector (8.142) 
does not exactly solve Ax = b, it is the vector that minimises the residual 

e = |Ax — b|, 

where here the vertical lines denote the absolute value of the quantity they 
contain, not the determinant. This is proved as follows. 

Suppose we were to add some arbitrary vector x' to the vector x in (8.142). 
This would result in the addition of the vector b' = Ax' to Ax — b; b' is clearly in 
the range of A since any part of x' belonging to the null space of A contributes 
nothing to Ax'. We would then have 

| Ax - b + b'| = |(USV t )(VSU t b) - b + b'| 

= KUSSUf-Ob + b'l 
= |U[(SS-l)U t b + U t b']| 

= |(SS-l)U t b + U t b'|; (8.143) 

in the last line we have made use of the fact that the length of a vector is left 
unchanged under the action of the unitary matrix U. Now, the diagonal square 
matrix SS, with dimensions M x M, will have non-zero entries (actually all equal 
to unity ) only for those values of j for which sj =f= 0. Thus, the jth component of 
the vector (SS— l)l/b will only be non-zero when Sj = 0. However, the yth element 
of the vector U^b' is given by the scalar product (u 7 )^b', which is non-zero only if 
sj =/= 0, since b' lies in the range of A. Thus, as these two terms only contribute to 
(8.143) for two disjoint sets of j-values, its minimum value, as x' is varied, occurs 
when b' = 0; this requires x' = 0. 
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►find the solution( s) to the set of simultaneous linear equations Ax = b, where A is given 
by (8.137) and b = (1 0 0) T . 


To solve the set of equations, we begin by calculating the vector given in (8.142), 

x = VSlPb, 

where U and V are given by (8.139) and (8.140) respectively and S is obtained by taking 
the transpose of S in (8.138) and replacing all the non-zero singular values s, by 1/s,-. Thus, 
S reads 

(\ 0 0 \ 

0 \ 0 

s = 3 , 

0 0 \ 

V 0 0 0 / 

Substituting the appropriate matrices into the expression for x we find 

x = |(1 1 1 1) T . (8.144) 

It is straightforward to show that this solves the set of equations Ax = b exactly, and 
so the vector b = ( 1 0 0) T must lie in the range of A. This is, in fact, immediately 

clear, since b = u 1 . The solution (8.144) is not, however, unique. There are three non-zero 
singular values, but N = 4. Thus, the matrix A has a one-dimensional null space, which 
is ‘spanned’ by v 4 , the fourth column of V, given in (8.140). The solutions to our set of 
equations, consisting of the sum of the exact solution and any vector in the null space of 
A, therefore lie along the line 

x=i(l 1 1 l) T + a(l -1 1 -1) T , 

where the parameter a can take any real value. We note that (8.144) is the point on this 
line that is closest to the origin. ◄ 


8.1 


8.2 


8.19 Exercises 


Which of the following statements about linear vector spaces are true? Where a 
statement is false, give a counter-example to demonstrate this. 

(a) Non-singular N x N matrices form a vector space of dimension N 2 . 

(b) Singular N x N matrices form a vector space of dimension N 2 . 

(c) Complex numbers form a vector space of dimension 2. 

(d) Polynomial functions of .x form an infinite-dimensional vector space. 

(e) Series {a 0 ,a l ,a 2 ,. . .,a N ] for which \a n \ 2 = 1 form an IV-dimensional 

vector space. 

(f) Absolutely convergent series form an infinite-dimensional vector space. 

(g) Convergent series with terms of alternating sign form an infinite-dimensional 
vector space. 

Evaluate the determinants 



a 

h 

g 


(a) 

h 

b 

f 

(b) 


g 

f 

c 



0 

1 

-3 

1 


2 

-2 

4 

-2 


3 

1 

-2 

1 
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8.3 


8.4 


8.5 


8.6 


8.7 


and 


gc 

ge 

a + ge 

gb + ge 

0 

b 

b 

b 

c 

e 

e 

b + e 

a 

b 

b + f 

b + d 


Using the properties of determinants, solve with a minimum of calculation the 
following equations for x: 


(a) 


x a a 1 
a x b 1 
a b x 1 
a b c 1 


= 0, 


(b) 


x + 2 x + 4 .x — 3 
x + 3 x x + 5 
x — 2 x— 1 x + 1 


= 0 . 


Consider the matrices 


(a) B 


0 -i 

1 0 
— i i 



(b) C 


1 


V3 -y/2 -V3 \ 

1 V6 -1 ■ 

2 0 2 / 


Are they (i) real, (ii) diagonal, (iii) symmetric, (iv) antisymmetric, (v) singular, 
(vi) orthogonal, (vii) Elermitian, (viii) anti-EIermitian, (ix) unitary, (x) normal? 
By considering the matrices 


A = 


1 0 
0 0 


B = 


0 0 \ 
3 4 ) 


show that AB = 0 does not imply that either A or B is the zero matrix but that 
it does imply that at least one of them is singular. 

(a) The basis vectors of the unit cell of a crystal, with the origin 0 at one corner, 
are denoted by ei, e 2 , e3. The matrix G has elements Gy, where Gy = e, ■ e ; - 
and Hjj are the elements of the matrix H = G _1 . Show that the vectors 
f, : = J2j Hij e j are the reciprocal vectors and that / = f, • f r 

(b) If the vectors u and v are given by 

u = ]T (z,e„ v = viii, 

i i 

obtain expressions for |u|, |v|, and u • v. 

(c) If the basis vectors are each of length a and the angle between each pair is 
7t/3, write down G and hence obtain H. 

(d) Calculate (i) the length of the normal from 0 onto the plane containing the 
points p -1 ^, y _1 e 2 , f _1 e 3 , and (ii) the angle between this normal and 

(a) Show that if A is Elermitian and U is unitary then U _1 AU is Hermitian. 

(b) Show that if A is anti-Hermitian then z'A is Elermitian. 

(c) Prove that the product of two Elermitian matrices A and B is Hermitian if 
and only if A and B commute. 

(d) Prove that if S is a real antisymmetric matrix then A = (I — S)(l + SR 1 is 
orthogonal. If A is given by 

_ / cos 8 sin 6 \ 

— — sin 8 cos 8 J 

then find the matrix S that is needed to express A in the above form. 

(e) If K is skew-hermitian, i.e. K T = — K, prove that V = (I + K)(l — KR 1 is unitary. 


313 



MATRICES AND VECTOR SPACES 


8.8 


8.9 


8.10 


8.11 


8.12 


8.13 


A and B are real non-zero 3x3 matrices and satisfy the equation 

(AB) t + B _1 A = 0. 


(a) Prove that if B is orthogonal then A is antisymmetric. 

(b) Without assuming that B is orthogonal, prove that A is singular. 

The commutator [X, Y] of two matrices is defined by the equation 

[X, Y] = XY - YX. 

Two anti-commuting matrices A and B satisfy 

A 2 = 1, B 2 = I, [A, B] = 2/C. 

(a) Prove that C 2 = I and that [B, C] = 2/A. 

(b) Evaluate [[[A, B], [B, C]], [A, B]]. 

The four matrices S X ,S,,,S_- and I are defined by 



where r = — 1. Show that S 2 = I and S X S V = /S_-, and obtain similar results 
by permutting x, y and z. Given that v is a vector with Cartesian components 
(v x ,v y ,v z ), the matrix S(v) is defined as 

S(v) = v x S x + DySj, + v : S : . 

Prove that, for general non-zero vectors a and b, 

S(a)S(b) = a ■ b I + / S(a x b). 


Without further calculation, deduce that S(a) and S(b) commute if and only if a 
and b are parallel vectors. 

A general triangle has angles a, f! and y and corresponding opposite sides a, 
b and c. Express the length of each side in terms of the lengths of the other 
two sides and the relevant cosines, writing the relationships in matrix and vector 
form using the vectors having components a,b,c and cos a, cos/?, cosy. Invert the 
matrix and hence deduce the cosine-law expressions involving a, ft and y. 

Given a matrix 


A = 


1 a 0 \ 

P 10 , 
0 0 1 ) 


where a and /? are non-zero complex numbers, find its eigenvalues and eigenvec- 
tors. Find the respective conditions for (a) the eigenvalues to be real and (b) the 
eigenvectors to be orthogonal. Show that the conditions are jointly satisfied if 
and only if A is Hermitian. 

Using the Gram-Schmidt procedure : 

(a) construct an orthonormal set of vectors from the following: 


xi=(0 0 1 1) T , x 2 = (l 0 -1 0) T , 

x 3 = (l 2 0 2) T , x 4 = (2 1 1 1) T ; 
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(b) find an orthonormal basis, within a four-dimensional Euclidean space, for 
the subspace spanned by the three vectors (1 2 0 0) T , (3 — 1 2 0) T 

and (0 0 2 1) T . 

8.14 If a unitary matrix U is written as A + i'B, where A and B are Hermitian with 
non-degenerate eigenvalues, show the following: 

(a) A and B commute; 

(b) A 2 + B 2 = I ; 

(c) The eigenvectors of A are also eigenvectors of B ; 

(d) The eigenvalues of U have unit modulus (as is necessary for any unitary 
matrix). 

8.15 Determine which of the matrices below are mutually commuting, and, for those 
that are, demonstrate that they have a complete set of eigenfunctions in common : 



8.16 Find the eigenvalues and a set of eigenvectors of the matrix 



Verify that its eigenvectors are mutually orthogonal. 

8.17 Find three real orthogonal column matrices, each of which is a simultaneous 
eigenvector of 


/ 0 

0 

i \ 



/ o 

1 

1 \ 

0 

i 

0 

and 

B = 

1 

0 

1 

V 1 

0 

o ) 



V 1 

1 

0 ) 


8.18 Use the results of the first worked example in section 8.14 to evaluate, without 

repeated matrix multiplication, the expression A 6 x, where x = (2 4 — 1) T and 

A is the matrix given in the example. 

8.19 Given that A is a real symmetric matrix with normalised eigenvectors e ! obtain 
the coefficients a,- involved when column matrix x, which is the solution of 

Ax — //X = v, ( * ) 

is expanded as x = JV a,e‘. Here /( is a given constant and v is a given column 
matrix. 


(a) Solve (*) when 


A = 


2 1 0 \ 
12 0 , 
0 0 3 J 


H = 2 and v = (1 2 3) T . 

(b) Would (*) have a solution if \i = 1 and (i) v = (1 2 3) T , (ii) v = 

(2 2 3) T ? 
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8.20 


8.21 


8.22 


8.23 


8.24 


8.25 


Demonstrate that the matrix 



is defective, i.e. does not have three linearly independent eigenvectors, by showing 
the following: 


(a) its eigenvalues are degenerate and, in fact, all equal; 

(b) any eigenvector has the form (/( (3/r — 2v) v) T . 

(c) if two pairs of values, /q,vi and /x 2 , V 2 , define two independent eigenvectors 
Vj and V 2 then any third similarly defined eigenvector V 3 can be written as a 
linear combination of V! and v 2 , i.e. 

V3 = avi + fiv 2 


where 


JU3V2 -JU2V3 


and b = 


H1V3 - Wi 


/fiv 2 /CD H1V2-H2V1 

Illustrate (c) using the example (/q,vi) = (1,1), (/r 2 ,v 2 ) = (1,2) and (^ 3 , V 3 ) = 

( 0 , 1 ). 

Show further that any matrix of the form 

/ 2 0 
I 6n — 6 4 — 2n 

^ 3 - 3 n n - 1 

is defective, with the same eigenvalues and eigenvectors as A. 

By finding the eigenvectors of the Hermitian matrix 



H = 


10 3 i \ 

- 3 i 2 J ’ 


construct a unitary matrix U such that U^HU = A, where A is a real diagonal 
matrix. 

Use the stationary properties of quadratic forms to determine the maximum and 
minimum values taken by the expression 

Q = 5x 2 + 4 y 2 + 4 z 2 + 2 xz + 2 xy 

on the unit sphere x 2 + y 2 + z 2 = 1. For what values of x,y,z do they occur? 
Given that the matrix 



has two eigenvectors of the form (1 y 1 ) T , use the stationary property of the 
expression J(x) = x T Ax/(x T x) to obtain the corresponding eigenvalues. Deduce 
the third eigenvalue. 

Find the lengths of the semi-axes of the ellipse 

73x 2 + 12xy + 52 y 2 = 100, 

and determine its orientation. 

The equation of a particular conic section is 

Q = 8 x 2 + 8 .x| — 6 x 3 X 2 = 110. 

Determine the type of conic section this represents, the orientation of its principal 
axes, and relevant lengths in the directions of these axes. 
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8.26 Show that the quadratic surface 

5.x 2 + 11 y 2 + 5 z 2 — lOyz + 2xz — 10 xy = 4 

is an ellipsoid with semi-axes of lengths 2, 1 and 0.5. Find the direction of its 
longest axis. 

8.27 Find the direction of the axis of symmetry of the quadratic surface 

lx 2 + ly 2 + 7z 2 — 20yz — 20xz + 20xy = 3. 

8.28 Find the eigenvalues, and sufficient of the eigenvectors, of the following matrices 
to be able to describe the quadratic surfaces associated with them. 

/ 5 1 —1 \ / 1 2 2 \ / 1 2 1 

(a) 1 5 1 , (b) 2 1 2 . (c) 2 4 2 

y -i i 5 ) y 2 2 i J y -l 2 1 

(a) Rearrange the result A' = S~*AS of section 8.16 to express the original 
matrix A in terms of the unitary matrix S and the diagonal matrix A'. Elence 
show how to construct a matrix A that has given eigenvalues and given 
(orthogonal) column matrices as its eigenvectors. 

(b) Find the matrix with eigenvectors ( 1 2 1 ) T , ( 1 —1 l) T and(l 0 — 1 ) T , 
and corresponding eigenvalues A, /( and v. 

(c) Try a particular case, say A = 3, /( = — 2 and v = 1, and verify by explicit 
solution that the matrix so found does have these eigenvalues. 

Find an orthogonal transformation that takes the quadratic form 

Q = —x 2 — 2x| — x 2 + 8x2X3 + 6x1x3 + 8x1x2 

into the form 

At yf + A 2 y\ - 4yf, 

and determine fii and /(2 (see section 8.17). 

8.31 One method of determining the nullity (and hence the rank) of an M x N matrix 
A is as follows. 

• Write down an augmented transpose of A, by adding on the right an N x N 
unit matrix and thus producing an N x (M + N ) array B. 

• Subtract a suitable multiple of the first row of B from each of the other lower 
rows so as to make B tl = 0 for i > 1. 

• Subtract a suitable multiple of the second row (or the uppermost row that 
does not start with M zero values) from each of the other lower rows so as to 
make B a = 0 for i > 2. 

• Continue in this way until all remaining rows have zeroes in the first M places. 
The number of such rows is equal to the nullity of A and the N rightmost 
entries of these rows are the components of vectors that span the null space. 
They can be made orthogonal if they are not so already. 

Use this method to show that the nullity of 

/—l 327 

3 10 -6 17 

A= -1 -2 2 —3 

2 3-44 

y 4 0-8-4 

is 2 and that an orthogonal base for the null space of A is provided by any two 
column matrices of the form (2 + a,- — 2a; 1 a,) T for which the a,- (i = 1,2) 

are real and satisfy 6ai0£ 2 + 2(ai + 012 ) + 5 = 0. 


8.29 


8.30 
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8.32 Do the following sets of equations have non-zero solutions? If so, find them. 

(a) 3x + 2y + z = 0, x — 3y + 2z = 0, 2x + y + 3z = 0. 

( b ) 2 x = b(y + z), x = 2 a(y — z), x = (6a — b)y — (6a + b)z. 

8.33 Solve the simultaneous equations 

2x + 3y + z = 11, 
x + y + z = 6, 

5x — y + 10r = 34. 

8.34 Solve the following simultaneous equations for xu xi and X3, using matrix 
methods: 


Xl + 2X2 + 3X3 = 1, 

3xi + 4x2 + 5x3 = 2, 
xi + 3x2 + 4x3 = 3. 

8.35 Show that the following equations have solutions only if r\ = 1 or 2, and find 
them in these cases: 

x + y + z = 1, 
x T 2y T 4r = rj, 
x + 4 y + 10" = if. 


8.36 


8.37 


8.38 


8.39 


Find the condition(s) on a such that the simultaneous equations 

xi + ax2 = 1, 

Xl — X2 + 3X3 = —1, 

2xi — 2x 2 + a *3 = —2 

have (a) exactly one solution, (b) no solutions, or (c) an infinite number of 
solutions; give all solutions where they exist. 

Make an LP decomposition of the matrix 



and hence solve Ax = b, where (i) b = (21 9 28 ) T , (ii) b = (21 7 22) T . 

Make an LU decomposition of the matrix 



Hence solve Ax = b for (i) b = (—4 1 8 — 5) T , (ii) b = ( — 10 0 —3 — 24) T . 

Deduce that det A = —160 and confirm this by direct calculation. 

Use the Cholesky separation method to determine whether the following matrices 
are positive definite. For each that is, determine the corresponding lower diagonal 
matrix L: 



B = 
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8.40 


8.41 


Find the equation satisfied by the squares of the singular values of the matrix 
associated with the following over-determined set of equations: 


2x + 3y + z = 0 
x — y — z = 1 
2x + y = 0 
2 y + z = —2. 


Show that one of the singular values is close to zero. Determine the two larger 
singular values by an appropriate iteration process and the smallest by indirect 
calculation. 

Find the SVD of 



8.42 


8.43 


showing that the singular values are ^/3 and 1. 
Find the SVD form of the matrix 



Hence find the best solution x to the equation Ax = b when (i) b = (6 — 

39 15 18) T , (ii) b = (9 —42 15 15) T , showing that (i) has an exact solution, 

but that the best solution to (ii) has a residual of 

Four experimental measurements of particular combinations of three physical 
variables, x, y and z, gave the following inconsistent results: 

13x + 22y — 13z = 4, 
lOx - 8)’ - lOz = 44, 

10.x - 8)’ - lOz = 47, 

9x - 18.v - 9z = 72. 

Find the SVD best values for x, y and z. Identify the null space of A and hence 
obtain the general SVD solution. 


8.1 


8.2 


8.20 Hints and answers 

(a) False. O n , the N x N null matrix, is nor non-singular. 

(b) False. Consider the sum of ^ ^ and 

(c) True. 

(d) True. 

(e) False. Consider b„ = a„ + a n for which J2n=o l^nl 2 = 4 =f= 1, or note that there 
is no zero vector with unit norm. 

(f) True. 

(g) False. Consider the two series defined by 

u 0 = 7, a„ = 2(— i)" for n>l; £>„ = — ( — i )" for n> 0. 

The series that is the sum of {«„} and {b n } does not have alternating signs 
and so closure does not hold. 

(a) abc + 2 fgh — af 2 — bg 2 — ch 2 , (b) 0, (c) ab(ab — cd). 


0 0 
0 1 
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8.3 (a) x = a, b or c; (b) x = —1, equation is linear in x. 

8.4 (a) iv, v, vii, x; (b) i, vi, ix, x. 

8.6 (b) QA- UiGijUj) 1 ' 2 , (J2ij ViHtjVj ) 1/2 , JA Wt \ 

1 ( 3/2 -1/2 -1/2 \ 

(c) H = — —1/2 3/2 —1/2 . (d) (i) M 1 , (ii) cos l (p/Ma ) where 

a y -1/2 -1/2 3/2 ) 

M = a [3(p 2 + q 2 + r 2 )/ 2 — qr — pr — pq] 1/2 . 

8.7 (d) S = ( tan( ° 0/2) — tan(0/2) ^ (e) Note that (| + K)(| — K ) = | — K 2 = 

(l-K)(l + K). 

8.8 (b) Note that | - A| = (— 1) 3 |A|. 

8.9 (b) 32iA. 

8.10 S(a)S(b) — S(b)S(a) = 2iS(a x b) and equals zero only if a x b = 0 . 

8.11 a = h cos y + c cos /?, and cyclic permutations ; a 2 = b 2 + c 2 — 2 be cos a, and cyclic 

permutations. 

8.12 2 = 1 , (0 0 1 ) T ; 

2=1+ (aP) 1/2 , (a 1/2 p 1 ' 2 0) T ; 

2=1 — (ayS ) 1/2 , (a 1/2 - P 1 ' 2 0) T ; 

(a) ocp real and > 0; (b) |a| = \p\. 

8.13 (a) 2- 1/2 (0 0 1 1) T , 6- 1/2 (2 0 -1 1) T , 

39-V2(_l 6 -1 1) T , 1 3— 1/,2 (2 1 2 — 2) T . 

(b) 5^ 1/2 (1 2 0 0) T , (345)-A 2 (14 -7 10 0) T , 

(18 285)-P 2 (-56 28 98 69) T . 

8.14 (a) Use UlP = U ; (b) use UU 1 ’ = I; (c) apply the result of subsection 8.13.5 to 
give the eigenvalue for U as 2 + ip; (d) apply result (b) to eigenvector u of U to 
deduce that 2 2 + /r = 1. 

8.15 C does not commute with the others; A, B and D have (1 — 2) T and (2 1 ) T as 

common eigenvectors. 

8.16 2=1, (1 1 3) t ; 

2 = 3 + VI5, (5 + VT5 7 + 2^15 -4+vT5) T ; 

8.17 For A : (1 0 — 1) T , (1 oq 1) T , (1 a, 1) T . 

For B : ( 1 1 if, (Pi Vi -pi-yif,(p 2 yi -pi-yif- 
The a,-, /?, and y,- are arbitrary. 

Simultaneous and orthogonal: (1 0 — 1) T , (1 1 1) T , (1 —2 1) T . 

8.18 Express x as a linear combination of the eigenvectors of A and use the fact that 

A"x = 2"x for an eigenvector; x = 3x (1) — x (2) ; A 6 x = (—537 921 729) T . 

8.19 a.j = (v • eA)/(2 7 - — p), where f is the eigenvalue corresponding to e'. 

(a) x = (2 1 3) T . 

(b) Since p is equal to one of A’s eigenvalues 2 ; -, the equation only has a solution 

if v ■ el* = 0; (i) no solution; (ii) x = (1 1 3/2) T . 

8.20 (a) All eigenvalues equal 2; (c) a = —1, b = 1. 

8.21 U = (10) _1/2 (1, 3(;3i, 1), A = (1,0; 0, 11). 

8.22 Maximum equal to 6 at +(2, 1, l)/f6\ minimum equal to 3 at +(1,— 1, — 1 )/^/3. 

8.23 J = (2y 2 — 4y + 4)/(v 2 + 2) with stationary values at y = 2 and corresponding 

eigenvalues 2 + J2. From the trace property of A, the third eigenvalue equals 2. 

8.24 The eigenvalues, after making the RHS unity, are 1/4 and 1, corresponding to 
semi-axis lengths of 2 and 1. The major axis makes an angle tan _1 (— 4/3) with 
the positive x-axis. 

8.25 Ellipse; 9 = rc/4, a = f22; 9 = 37i/4, b = ^/TO. 
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8.26 

8.27 

8.28 


8.29 


8.30 

8.31 

8.32 

8.33 

8.34 

8.35 

8.36 

8.37 


8.38 


8.39 


8.40 


8.41 


8.42 


8.43 


The eigenvector corresponding to the smallest eigenvalue is in the direction 

(l,U)A/3. 

The direction of the eigenvector having the non-repeated eigenvalue is 

(1,1,-1)/V3. 

(a) Eigenvalues 6, 6, 3; an ellipsoid with circular cross-section of radius r, 
say, perpendicular to the direction (1,-1, l)/^3, and with semi-axis in that 
direction of J2r. 

(b) Eigenvalues 5, — 1 , —1; a hyperboloid of revolution about ail axis in the 
direction (1, 1, l)/^/3, the two halves of the hyperboloid being asymptotic to 
that cone of semi-angle tan -1 J5 that passes through the origin and also has 
its axis in that direction. 

(c) Eigenvalues 6, 0, 0; a pair of parallel planes, equidistant from the origin and 
with their normals in the directions +(1,2, 1 1/^/6. 

(a) A = SA'S + , where S is the matrix whose columns are the eigenvectors of the 
matrix A to be constructed, and A' = diag (2,/z, v). 

(b) A = (2 + 2 f.i + 3v, 22 — 2 /(, 2 + 2,u — 3v ; 22 — 2 /(, 42 + 2 /(, 21 — 2/1; 

2 + 2 /i — 3v, 22 — 2 /i, 2 + 2 /i + 3v). 

(c) 1(1,5, —2; 5,4, 5;— 2, 5,1). 

y i = (m + *2 + x 3 )/V3, T2 = (xi - 2 x 2 + x 3 )/x/ 6, y 3 = + xs)/^/2; 

Hi = 6 , Hi = - 6 . 

The null space is spanned by (2 0 1 0) T and (1 —2 0 1) T . 

(a) No, |A| = —24 + 0; yes, x : y : z = 4ab : 4 a + b : 4 a — b. 
x = 3, y = 1, z = 2. 

_\'i = —3/2, X 2 = 7/2, x 3 = —3/2. 

>7 = 1, x = 1 + 2z, y = — 3z ; = 2, x = 2z, y = 1 — 3 z. 

(a) a + 6, a =/= 1 ; xq = (1 — a)/(l + a), x 3 = 2/(1 — a), x 3 = 0. 

(b) a = 1. (c) a = 6; xi = 1 — 6/?, X 2 = B, x 3 = (IB — 2)/3 for any B. 

L = (1,0,0; i, 1,0; §, 3,1), U = (3,6,9;0,-2,2;0,0,4). 

(i) x = (-1 1 2) T . (ii) x = (-3 2 2) T . 

L = (1,0,0,0; 1,0,0; |, jj, 1,0; — — 1); 

U = (2, —3,1, 3; 0, y — |, — | ;0, 0, ff, 0,0,0, -f ). 

(i) x = (2 -1 4 -5) T . (ii) x = (-1 1 4 - 3) T . 

A is not positive definite as L 33 is calculated to be 
B = LL t , where the non-zero elements of L are 
+ii = x/5, + 3 i = s/2/5, L 22 = s/2- +35 = \/12/5. 

2 3 — 272 2 + 1212 — 3 = 0. Find the two larger roots for 2 using the rearrangement 
method described in subsection 28.1.1 and the smallest one using the property of 
the product of the roots. The singular values are 4.6190, 2.3748 and 0.1579. 


A f A = 


2 1 
1 2 


u VS 




The singular values are 18^/6, —18, —12^/3. 

(i) x = (1 1 2) T with all four equations exactly satisfied. 

(ii) x = ^(40 37 74) T , giving a residual column matrix (—1 2 2 3) T . 

The singular values are 12^6,0, — 18^/3 and the calculated best solution is x = 
1.71, y = — 1.94, z = —1.71. The null space is the line x = z,y = 0 and the general 
SVD' solution is x = 1.71 + 2, y = -1.94, z = -1.71 + 2. 
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Normal modes 


Any student of the physical sciences will encounter the subject of oscillations on 
many occasions and in a wide variety of circumstances, for example the voltage 
and current oscillations in an electric circuit, the vibrations of a mechanical 
structure and the internal motions of molecules. The matrices studied in the 
previous chapter provide a particularly simple way to approach what may appear, 
at first glance, to be difficult physical problems. 

We will consider only systems for which a position-dependent potential exists, 
i.e., the potential energy of the system in any particular configuration depends 
upon the coordinates of the configuration, which need not be be lengths however; 
the potential must not depend upon the time derivatives (generalised velocities) of 
these coordinates. So, for example, the potential — q\ • A used in the Lagrangian 
description of a charged particle in an electromagnetic field is excluded. A 
further restriction that we place is that the potential has a local minimum at 
the equilibrium point; physically, this is a necessary and sufficient condition for 
stable equilibrium. By suitably defining the origin of the potential, we may take 
its value at the equilibrium point as zero. 

We denote the coordinates chosen to describe a configuration of the system 
by q t , i = 1,2,..., TV. The q, need not be distances; some could be angles, for 
example. For convenience we can define the qt so that they are all zero at the 
equilibrium point. The instantaneous velocities of various parts of the system will 
depend upon the time derivatives of the q f , denoted by q For small oscillations 
the velocities will be linear in the <ji and consequently the total kinetic energy T 
will be quadratic in them - and will include cross terms of the form q t qj with 
i =/= j. The general expression for T can be written as the quadratic form 

T = a 'i ( i4i = q T Aq, (9.1) 

i j 

where q is the column vector (<ji ••• q N ) J and the N x N matrix A 
is real and may be chosen to be symmetric. Furthermore, A, like any matrix 
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corresponding to a kinetic energy, is positive definite (more strictly positive semi- 
definite); that is, whatever real values the q ,• take, the quadratic form (9.1) has a 
value > 0. 

Turning now to the potential energy, we may write its value for a configuration 
q by means of a Taylor expansion about the origin q = 0, 


However, we have chosen F(0) = 0 and, since the origin is an equilibrium point, 
there is no force there and dV(0)/8qi = 0. Consequently, to second order in the 
q t we also have a quadratic form, but in the coordinates rather than in their time 
derivatives: 


V = bi J qiq J = qTeq ’ ( 9 - 2 ) 

> i 

where B is, or can be made, symmetric. In this case, and in general, the requirement 
that the potential is a minimum means that the potential matrix B, like the kinetic 
energy matrix A, is real and positive definite. 


9.1 Typical oscillatory systems 

We now introduce particular examples, although the results of this section are 
general, given the above restrictions and the reader will find it easy to apply the 
results to many other instances. 

Consider first a uniform rod of mass M and length /, attached by a light string 
also of length 1 to a fixed point P and executing small oscillations in a vertical 
plane. We choose as coordinates the angles 0\ and (A shown, with exaggerated 
magnitude, in figure 9.1. In terms of these coordinates the centre of gravity of the 
rod has, to first order in the Oj, a velocity component in the x-direction equal to 
W\ + jld 2 and in the y-direction equal to zero. Adding in the rotational kinetic 
energy of the rod about its centre of gravity we obtain, to second order in the 0,-, 


T * + \9l + 0i 0 2 ) + ^M/ 2 0? 



= \M\ 2 (3d? + 30J02 + 0 2 ) = ±Ml 2 q T ( * 


(9.3) 

where q T = (0i 0 2 ). The potential energy is given by 



V = Mlg [(1 — cos 0i ) + j(l — cos 0 2 )] 


(9.4) 

nlMlg(29 2 + 6 2 2 )=±Mlgq T ( 6 0 

H 

(9.5) 


where g is the acceleration due to gravity and q = (ffi 9 2 ) T ', (9.5) is valid to 
second order in the 0,-. 
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P 



P 



o 2 - 



Figure 9.1 A uniform rod of length / attached to the fixed point P by a light 
string of the same length: (a) the general coordinate system; (b) approximation 
to the normal mode with lower frequency; (c) approximation to the mode with 
higher frequency. 


With these expressions for T and V we now apply the conservation of energy, 

i(T + V) = 0, (9.6, 

assuming that there are no external forces other than gravity. In matrix form 
(9.6) becomes 

^-(q T Aq + q T Bq) = q T Aq + q T Aq + q T Bq + q T Bq = 0, 

at 

which, using A = A T and B = B T , gives 

2q T (Aq + Bq) = 0. 

We will assume, although it is not clear that this gives the only possible solution, 
that the above equation implies that the coefficient of each fi is separately zero. 
Hence 


Aq + Bq = 0. (9.7) 

For a rigorous derivation Lagrange’s equations should be used, as in chapter 22. 

Now we search for sets of coordinates q that all oscillate with the same period, 
i.e. the total motion repeats itself exactly after a finite interval. Solutions of this 
form will satisfy 

q = xcos cot; (9.8) 

the relative values of the elements of x in such a solution will indicate how each 
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coordinate is involved in this special motion. In general there will be N values 
of co if the matrices A and B are N x N and these values are known as normal 
frequencies or eigenfrequencies. 

Putting (9.8) into (9.7) yields 

— m 2 Ax + Bx = (B-orA)x = 0. (9.9) 


Our work in section 8.18 showed that this can have non-trivial solutions only if 


|B-co 2 A| = 0. 


(9.10) 


This is a form of characteristic equation for B, except that the unit matrix I has 
been replaced by A. It has the more familiar form if a choice of coordinates is 
made in which the kinetic energy T is a simple sum of squared terms, i.e. it has 
been diagonalised, and the scale of the new coordinates is then chosen to make 
each diagonal element unity. 

However, even in the present case, (9.10) can be solved to yield o jj? for k = 
1,2, where N is the order of A and B. The values of co % can be used 
with (9.9) to find the corresponding column vector x k and the initial (stationary) 
physical configuration that, on release, will execute motion with period 2n/cok. 

In equation (8.76) we showed that the eigenvectors of a real symmetric matrix 
were, except in the case of degeneracy of the eigenvalues, mutually orthogonal. 
In the present situation an analogous, but not identical, result holds. It is shown 
in section 9.3 that if x 1 and x 2 are two eigenvectors satisfying (9.9) for different 
values of co 2 then they are orthogonal in the sense that 

(x 2 ) t Ax 1 = 0 and (x 2 ) T Bx‘ = 0. 

The direct ‘scalar product’ (x 2 ) T x 1 , formally equal to (x 2 ) T l x 1 , is not, in general, 
equal to zero. 

Returning to the suspended rod, we find from (9.10) 


Mlg 1 

'6 0 ) 

co 2 Ml 2 j 

' 6 3 \ 

12 \ 

V 0 3 J 

' 12 l 

,32 J 


Writing co 2 l/g = 2, this becomes 


6-62 -32 

-32 3 -22 


2 2 - 102 + 6 = 0 , 


which has roots 2 = 5+ v 'l 9. Thus we find that the two normal frequencies are 
given by coi = (0 ,64lg/l)^ 2 and <02 = (9.359g//) 1 / 2 . Putting the lower of the two 
values for co 2 , namely (5 — s f\9)g/l, into (9.9) shows that for this mode 

xi : x 2 = 3(5 - VT9) : 6( Vl9 - 4) = 1.923 : 2.153. 


This corresponds to the case where the rod and string are almost straight out, i.e. 
they almost form a simple pendulum. Similarly it may be shown that the higher 
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frequency corresponds to a solution where the string and rod are moving with 
opposite phase and xi : xi = 9.359 : —16.718. The two situations are shown in 
figure 9.1. 

In connection with quadratic forms it was shown in section 8.17 how to make 
a change of coordinates such that the matrix for a particular form becomes 
diagonal. In exercise 9.6 a method is developed for diagonalising simultaneously 
two quadratic forms (though the transformation matrix may not be orthogonal). 
If this process is carried out for A and B in a general system undergoing stable 
oscillations, the kinetic and potential energies in the new variables ; take the 
forms 


T = Mi = i'l T Mf), 

i 

M = diag (pi, P 2 , - 


(9.11) 

V = 5Z v,,? ' 2 = n T Nr|, 

N = diag (v 1? v 2 . . 

■ > Vj v)» 

(9.12) 


and the equations of motion are the uncoupled equations 

Mi + V'V), =0, i=l,2,...,N. (9.13) 

Clearly a simple renormalisation of the rp can be made that reduces all the p t 
in (9.11) to unity. When this is done the variables so formed are called normal 
coordinates and equations (9.13) the normal equations. 

When a system is executing one of these simple harmonic motions it is said to 
be in a normal mode, and once started in such a mode it will repeat its motion 
exactly after each interval of 2n/ojj. Any arbitrary motion of the system may 
be written as a superposition of the normal modes, and each component mode 
will execute harmonic motion with the corresponding eigenfrequency; however, 
unless by chance the eigenfrequencies are in integer relationship, the system will 
never return to its initial configuration after any finite time interval. 

As a second example we will consider a number of masses coupled together by 
springs. For this type of situation the potential and kinetic energies are automat- 
ically quadratic functions of the coordinates and their derivatives, provided the 
elastic limits of the springs are not exceeded, and the oscillations do not have to 
be vanishingly small for the analysis to be valid. 


►Find the normal frequencies and modes of oscillation of three particles of masses m, pm, 
m connected in that order in a straight line by two equal light springs of force constant k. 
(This arrangement could serve as a model for some linear molecules, e.g. CO 2 J 


The situation is shown in figure 9.2; the coordinates of the particles, xi, X2, X3, are 
measured from their equilibrium positions, at which the springs are neither extended nor 
compressed. 

The kinetic energy of the system is simply 

T = \m (xj + p x\ + X 3 ) , 
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Xl 


X2 


X 3 


Figure 9.2 Three masses w, /rw and m connected by two equal light springs 
of force constant k. 



< > 


(c) 


oo 



Figure 9.3 The normal modes of the masses and springs of a linear molecule 
such as CO 2 . (a) or = 0; ( b ) to 2 = k/m ; (c) to 2 = [(/( + 2)/fx](k/m). 


whilst the potential energy stored in the springs is 

V = \k [(x 2 - xi) 2 + (x 3 - x 2 ) 2 ] . 
The kinetic- and potential-energy symmetric matrices are thus 


/ 1 

0 

0 \ 

k { 

1 

-1 

0 \ 

0 

b 

0 , B 


-1 

2 

- 1 

v° 

0 

w 

H 

0 

-1 

1 ) 

: normal frequencies we 

have to 

solve 

|B — 

co 2 A| = 


1 

-2 -1 

0 






— 1 2 — fiX 

-1 

= 0, 





0 -1 

1-2 





mw 2 /k = 2, we have 


which leads to 2 = 0, 1 or 1+2 / fi. The corresponding eigenvectors are respectively 


1 


x 1 = — 1 

v/3 l 1 


1 


1 


x 2 — — I 0 


1 


V2 


\/2 + ( 4 / n 2 ) 



The physical motions associated with these normal modes are illustrated in figure 9.3. 
The first, with 2 = co = 0 and all the x, equal, merely describes bodily translation of the 
whole system, with no (i.e. zero-frequency) internal oscillations. 

In the second solution the central particle remains stationary, X 2 = 0, whilst the other 
two oscillate with equal amplitudes in antiphase with each other. This motion, which has 
frequency co = (k/m) 1/2 , is illustrated in figure 9.3(b). 

The final and most complicated of the three modes has frequency co = {[(/< + 
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2 )/Ml(k/m)} 1/2 , and involves a motion of the central particle which is in antiphase with 
that of the two outer ones and which has an amplitude 2/fi times as great. In this motion 
(see figure 9.3(c)) the two springs are compressed and extended in turn. We also note 
that in the second and third normal modes the centre of mass of the molecule remains 
stationary. ◄ 


9.2 Symmetry and normal modes 

It will have been noticed that the system in the above example has an obvious 
symmetry under the interchange of coordinates 1 and 3: the matrices A and B, 
the equations of motion and the normal modes illustrated in figure 9.3 are all 
unaltered by the interchange of xi and — X 3 . This reflects the more general result 
that for each physical symmetry possessed by a system, there is at least one 
normal mode with the same symmetry. 

The general question of the relationship between the symmetries possessed by 
a physical system and those of its normal modes will be taken up more formally 
in chapter 25 where the representation theory of groups is considered. However, 
we can show here how an appreciation of a system’s symmetry properties will 
sometimes allow its normal modes to be guessed (and then verified), something 
that is particularly helpful if the number of coordinates involved is greater than 
two and the corresponding eigenvalue equation (9.10) is a cubic or higher-degree 
polynomial equation. 

Consider the problem of determining the normal modes of a system consist- 
ing of four equal masses M at the corners of a square of side 2 L, each pair 
of masses being connected by a light spring of modulus k that is unstretched 
in the equilibrium situation. As shown in figure 9.4, we introduce Cartesian 
coordinates x n ,y n , with n = 1 , 2 , 3, 4, for the positions of the masses and de- 
note their displacements from their equilibrium positions R„ by q„ = x„i + y„ j. 
Thus 


r„ = R„ + q„ with R„ = +Li + Lj. 

The coordinates for the system are thus x\,y\,X 2 , . ..,V4 and the kinetic en- 
ergy matrix A is given trivially by Mlg, where lg is the 8x8 identity ma- 
trix. 

The potential energy matrix B is much more difficult to calculate and involves, 
for each pair of values m,n, evaluating the quadratic approximation to the 
expression 

bum ( | r,„ r„| |R>?i R/i I ) ~ ■ 

Expressing each r, in terms of q ; and R, and remembering that |R m — R„| >- 
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Figure 9.4 The arrangement of four equal masses and six equal springs 
discussed in the text. The coordinate systems x n ,y„ for n = 1,2, 3,4 measure 
the displacements of the masses from their equilibrium positions. 


IQ/h tp, | , we obtain b mn ( b nm ). 


bmn 9 ^ [l(Rm Rn) T ((|m t|n)l |Rm Rfll] 

= \k | [|R,„ - R„| 2 + 2(q m - q„) • (R M - R«) + |q„. - q„)| 2 ] 1/_ - |R,„ - R„|} 

/ i /-* \ 2 

. 2(q m ■ (Rm R«) . 


= i/c|R m -R„ 


|R,„-R„I 2 


1/2 


- 1 


\k 


(*lj77 * (RjVf R;: 


This final expression is readily interpretable as the potential energy stored in the 
spring when it is extended by an amount equal to the component, along the 
equilibrium direction of the spring, of the relative displacement of its two ends. 

Applying this result to each spring in turn gives the following expressions for 
the elements of the potential matrix. 


m n 2 b mn /k 

1 2 (.xi — X 2) 2 

13 Oh — Tt) 2 

1 4 l(-xi + x 4 + yi -y 4 ) 2 

2 3 j(x 2 — X 3 + y 2 y 3 ) 2 

2 4 (y 2 — y 4 ) 2 

3 4 (X 3 — X 4 ) 2 . 
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The potential matrix is thus constructed as 


/ 

3 

-1 

-2 

0 

0 

0 

-1 

1 

\ 


-1 

3 

0 

0 

0 

-2 

1 

-1 



-2 

0 

3 

1 

-1 

-1 

0 

0 



0 

0 

1 

3 

-1 

-1 

0 

-2 



0 

0 

-1 

-1 

3 

1 

-2 

0 



0 

-2 

-1 

-1 

1 

3 

0 

0 



-1 

1 

0 

0 

-2 

0 

3 

-1 


V 

1 

-1 

0 

-2 

0 

0 

-1 

3 

J 


To solve the eigenvalue equation |B — AA| = 0 directly would mean solving 
an eigth-degree polynomial equation. Fortunately, we can exploit intuition and 
the symmetries of the system to obtain the eigenvectors and corresponding 
eigenvalues without such labour. 

Firstly, we know that bodily translation of the whole system, without any 
internal vibration, must be possible and that there will be two independent 
solutions of this form, corresponding to translations in the x- and y- directions. 
The eigenvector for the first of these (written in row form to save space) is 

x (1) = (1010101 0) T . 

Evaluation of Bx (1) gives 

Bx (1) = (0 000000 0) T , 

showing that x (1) is a solution of (B — co 2 A)x = 0 corresponding to the eigenvalue 
cd 2 = 0, whatever form Ax may take. Similarly, 

x (2) =(0 101010 1) T 


is a second eigenvector corresponding to the eigenvalue co 2 = 0. 

The next intuitive solution, again involving no internal vibrations, and, there- 
fore, expected to correspond to co 2 = 0, is pure rotation of the whole system 
about its centre. In this mode each mass moves perpendicularly to the line joining 
its position to the centre, and so the relevant eigenvector is 


x (3) = 



-1 -1 1 -1 - 1 ) T . 


It is easily verified that Bx <3) = 0 thus confirming both the eigenvector and the 
corresponding eigenvalue. The three non-oscillatory normal modes are illustrated 
in diagrams (n)-(c) of figure 9.5. 

We now come to solutions that do involve real internal oscillations, and, 
because of the four-fold symmetry of the system, we expect one of them to be a 
mode in which all the masses move along radial lines - the so-called ‘breathing 
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/* 


(a) co 2 = 0 

(b) co 2 = 0 

(c) CO 2 

= 0 

(d) CD 2 

= 2 k/M 

1 i 

- - 

«' 

\ 

\ 

s 

I 1 

— . - 

\ 


/ 

\ 

(e) co 2 = k/M 

(f) co 2 = k/M 

(g) CO 2 : 

= k/M 

(/?) CO 2 

= k/M 


Figure 9.5 The displacements and frequencies of the eight normal modes of 
the system shown in figure 9.4. Modes (a), ( b ) and (c) are not true oscillations: 
(a) and (b) are purely translational whilst (c) is one of bodily rotation. 
Mode (d), the ‘breathing mode', has the highest frequency and the remaining 
four, (e)- (/?), of lower frequency, are degenerate. 


mode’. Expressing this motion in coordinate form gives as the fourth eigenvector 


* - vi '- 1 

Evaluation of Bx (4) yields 

k , „ , 


1 


Bx (4) = 


4V2 


1 - 1-11 - 1 ) T . 


-8 - 8 8 - 8) T = 2/vX (4) , 


i.e. a multiple of x (4) , confirming that it is indeed an eigenvector. Further, since 
Ax (4) = Mx (4) , it follows from (B — co 2 A)x = 0 that or = 2k /M for this normal 
mode. Diagram (d) of the figure illustrates the corresponding motions of the four 
masses. 

As the next step in exploiting of the symmetry properties of the system we 
note that, because of its reflection symmetry in the x-axis, the system is invariant 
under the double interchange of y \ with — y 3 and y 2 with —y 4 . This leads us to 
try an eigenvector of the form 


x (5) = (0 a 0 ft 0 -a 0 - [J) T . 


Substituting this trial vector into (B — arA)x = 0 gives, of course, eight simulta- 
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neous equations for a and /), but they are all equivalent to just two, namely 

a + /? = 0, 

, „ AM or 

5a + p = — - — a; 

these have the solution a = —fS and or = k/M. The latter thus gives the frequency 
of the mode with eigenvector 


= (0 1 


1 0 1) T . 


Note that, in this mode, when the spring joining masses 1 and 3 is most stretched, 
the one joining masses 2 and 4 is at its most compressed. Similarly, based on 
reflection symmetry in the y-axis, 


= (1 o 


0 - 1 


can be shown to be an eigenvector corresponding to the same frequency. These 
two modes are sketched in diagrams (e) and if) of figure 9.5. 

This accounts for six of the expected eight modes, and the other two could be 
found by considering motions that are symmetric about both diagonals of the 
square or are invariant under successive reflections in the x- and y- axes. However, 
since A is a multiple of the unit matrix, and since we know that (x ( 4) T Ax (,) = 0 if 
i 7 ^ h we can find the two remaining eigenvectors more easily by requiring them 
to be orthogonal to each of those found so far. 

Let us take the next (seventh) eigenvector, x (7) , to be given by 

x ,7) = (a b c d e f g h) T . 

Then orthogonality with each of the x ,n) for n = 1,2 , ...,6 yields six equations 
satisfied by the unknowns a,b,..., h. As the reader may verify, they can be reduced 
to the six simple equations 

a + g = 0, d + f = 0, a + / = d + g, 
b + h — 0, c + e = 0, b + c = e + h. 

With six homogeneous equations for eight unknowns, effectively separated into 
two groups of four, we may pick one in each group arbitrarily. Taking a = b = 1 
gives d = e — 1 and c — f = g = h = — 1 as a solution. Substitution of 

x (7) = (l 1 -111 -1 -1 - 1 ) T . 

into the eigenvalue equation checks that it is an eigenvector and shows that the 
corresponding eigenfrequency is given by m 2 = k/M. 

We now have the eigenvectors for seven of the eight normal modeshand the 
eighth can be found by making it simultaneously orthogonal to each of the other 
seven. It is left to the reader to show (or verify) that the final solution is 
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and that this mode has the same frequency as three of the other modes. The 
general topic of the degeneracy of normal modes is discussed in chapter 25. The 
movements associated with the final two modes are shown in diagrams (g) and 
(h) of figure 9.5; this figure summarises all eight normal modes and frequencies. 

Although this example has been lengthy to write out, we have seen that the 
actual calculations are quite simple and provide the full solution to what is 
formally a matrix eigenvalue equation involving 8x8 matrices. It should be 
noted that our exploitation of the intrinsic symmetries of the system played a 
crucial part in finding the correct eigenvectors for the various normal modes. 


9.3 Rayleigh-Ritz method 

We conclude this chapter with a discussion of the Rayleigh-Ritz method for 
estimating the eigenfrequencies of an oscillating system. We recall from the 
introduction to this chapter that for a system undergoing small oscillations the 
potential and kinetic energy are given by 

V = q T Bq and T = q T Aq, 


where the components of q are the coordinates chosen to represent the configura- 
tion of the system and A and B are symmetric matrices (or may be chosen to be 
such). We also recall from (9.9) that the normal modes x' and the eigenfrequencies 
cot are given by 

(B - fflfAJx' = 0. (9.14) 


It may be shown that the eigenvectors x ! corresponding to different normal modes 
are linearly independent and so form a complete set. Thus, any coordinate vector 
q can be written q = J2j C /X J - We now consider the value of the generalised 
quadratic form 

i/ \ = = E,„(x” i ) t <„b 

^ j x T Ax E/x'FcJA J2 k c kx r 


which, since both numerator and denominator are positive definite, is itself non- 
negative. Equation (9.14) can be used to replace Bx', with the result that 


= E„,(x" l ) T <„AE,ffl, 2 DX / 

£/ xJ ) Tc } A Ek c fcx' t 


E»,(x" , ) T G,E,« 2 c,Ax i 
E/(x-O t c}a J2k c fcX fc 


(9.15) 


Now the eigenvectors x' obtained by solving (B — arA)x = 0 are not mutually 
orthogonal unless either A or B is a multiple of the unit matrix. However, it may 
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be shown that they do possess the desirable properties 

(x 7 ) t Ax' = 0 and (x') T Bx' = 0 if i ^ j. (9.16) 


This result is proved as follows. From (9.14) it is clear that, for general i and j, 

(x ; ) t (B — cofA)x' = 0. (9.17) 


But, by taking the transpose of (9.14) with i replaced by j and recalling that A 
and B are real and symmetric, we obtain 


(x ; ) t (B — coyA) = 0. 


Forming the scalar product of this with x ! and subtracting the result from (9.17) 
gives 

(ojj — cof )(x j ) t Ax' = 0. 

Thus, for i =/= j and non-degenerate eigenvalues of and coj, we have that 
(x j ) t Ax' = 0, and substituting this into (9.17) immediately establishes the corre- 
sponding result for (x!) T Bx'. Clearly, if either A or B is a multiple of the unit 
matrix then the eigenvectors are mutually orthogonal in the normal sense. The 
orthogonality relations (9.16) are re-derived and extended in exercise 9.6. 

Using the first of the relationships (9.16) to simplify (9.15), we find that 


= E, |c,| 2 ojf(x / ) T Ax 
E/i l c fc| 2 (x fc ) T Ax fe ' 


Now, if co q is the lowest eigenfrequency then of > o f for all i and, further, since 
(x') t Ax' > 0 for all i the numerator of (9.18) is > of E,- |c,| 2 (x') T Ax'. Flence 


X T p x 

A(X) 55 (TO - “O’ (9 - 19) 

for any x whatsoever (whether x is an eigenvector or not). Thus we are able to 
estimate the lowest eigenfrequency of the system by evaluating X for a variety 
of vectors x, the components of which, it will be recalled, give the ratios of the 
coordinate amplitudes. This is sometimes a useful approach if many coordinates 
are involved and direct solution for the eigenvalues is not possible. 

An additional result is that the maximum eigenfrequency may also be 
estimated. It is obvious that if we replace the statement ‘(»f > cof for all f by 
‘o)f < o)^ for all f, then A(x) < for any x. Thus A(x) always lies between 
the lowest and highest eigenfrequencies of the system. Furthermore, l(x) has a 
stationary value, equal to cof, when x is the kth eigenvector (see subsection 8.17.1). 
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► Estimate the eigenfrequencies of the oscillating rod of section 9.1. 


Firstly we recall that 


A = 


Ml 2 

T: 2 


6 3 
3 2 


and 


B = 


Mlg ( 6 0 


12 


0 3 


Physical intuition suggests that the slower mode will have a configuration approximating 
that of a simple pendulum (figure 9.1), in which (fi = 0 2 , and so we use this as a trial 
vector. Taking x = (8 0) T , 


x t Bx _ 3M/g0 2 /4 
x T Ax " 7Ml 2 9 2 /6 


9g 

14/ 


0.643 1, 


and we conclude from (9.19) that the lower (angular) frequency is < (0.643g//) 1/2 . We 
have already seen on p. 325 that the true answer is (0.641g//) 1/2 and so we have come 
very close to it. 

Next we turn to the higher frequency. Here, a typical pattern of oscillation is not so 
obvious but, rather preempting the answer, we try (A = — 20i ; we then obtain X = 9g/l 
and so conclude that the higher eigenfrequency > (9g//) 1/2 . We have already seen that the 
exact answer is (9.359g//) 1/2 and so again we have come close to it. ◄ 


A simplified version of the Rayleigh-Ritz method may be used to estimate the 
eigenvalues of a symmetric (or in general Hermitian) matrix B, the eigenvectors 
of which will be mutually orthogonal. By repeating the calculations leading to 
(9.18), A being replaced by the unit matrix I, it is easily verified that if 


2(x) 


x 1 Bx 


is evaluated for any vector x then 


/'.] < X(x) < X m , 


where are the eigenvalues of B in order of increasing size. A similar 

result holds for Hermitian matrices. 


9.4 Exercises 

9.1 Three coupled pendulums swing perpendicularly to the horizontal line containing 
their points of suspension, and the following equations of motion are satisfied : 

—mx i = cmx i + d{x\ — x 2 ), 

—Mx 2 = cMx 2 + d(x 2 — xi ) + d(x 2 — X3), 

—mx 3 = cmx 3 + d(x 3 — x 2 ), 

where xu x 2 and x 2 are measured from the equilibrium points, m, M and m 
are the masses of the pendulum bobs and c and d are positive constants. Find 
the normal frequencies of the system and sketch the corresponding patterns of 
oscillation. What happens asJ— >0orJ— >00? 

9.2 A double pendulum, smoothly pivoted at A, consists of two light rigid rods, AB 
and BC, each of length /, which are smoothly jointed at B and carry masses m and 
am at B and C respectively. The pendulum makes small oscillations in one plane 
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under gravity; at time t, AB and BC make angles 9(t) and <j>(t ) respectively with 
the downward vertical. Find quadratic expressions for the kinetic and potential 
energies of the system and hence show that the normal modes have angular 
frequencies given by 

co 2 = j |^1 + a + sj a(l + a)J . 

For a = 1/3, show that in one of the normal modes the mid-point of BC does 
not move during the motion. 

9.3 Continue the worked example modelling a linear molecule discussed at the end 
of section 9.1, for the case in which fi = 2. 

(a) Show that the eigenvectors derived there have the expected orthogonality 
properties with respect to both A and B. 

(b) For the situation in which the atoms are released from rest with initial 
displacements x\ = 2e, x 2 = — e and x 3 = 0, determine their subsequent 
motions and maximum displacements. 

9.4 Consider the circuit consisting of three equal capacitors and two different induc- 
tors shown in the figure. For charges <2, on the capacitors and currents /, through 


Qi 


Qi 


( 

Li 

jO 

23 

( 

= c 

c 

L 2 

c 


h 

h 


the components, write down KirchhoflPs law for the total voltage change around 
each of two complete circuit loops. Note that, to within an unimportant constant, 
the conservation of current implies that Q 2 = Qi — Qi and hence express the loop 
equations in the form given in (9.7), namely 


AQ + BQ = 0. 

Use this to show that the normal frequencies of the circuit are given by 

co 2 = [U +L 2 ± (. L\ + L 2 - UL 2 ) 1 ' 2 } . 

Cl/1 l<2 


Obtain the same matrices and result by finding the total energy stored in the 
various capacitors (typically Q 2 /(2C )) and in the inductors (typically LI 2 / 2). 

For the special case = L 2 = L determine the relevant eigenvectors and so 
describe the patterns of current flow in the circuit. 

9.5 It is shown in physics and engineering textbooks that circuits containing capaci- 
tors and inductors can be analysed by replacing a capacitor of capacitance C by a 
‘complex impedance’ l/(i'coC) and an inductor of inductance L by an impedance 
icoL, where or is the angular frequency of the currents flowing and i 2 = — 1. 

Use this approach and Kirchhoff’s circuit laws to analyse the circuit shown in 
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the figure and obtain three linear equations governing the currents Iuh and / 3. 
Show that the only possible frequencies of self-sustaining currents satisfy either 



(a) orLC = 1 or (b) 3 orLC = 1. Find the corresponding current patterns and, 
in each case, by identifying parts of the circuit in which no current flows, draw 
an equivalent circuit that contains only one capacitor and one inductor. 

9.6 The simultaneous reduction to diagonal form of two real symmetric quadratic forms. 

Consider the two real symmetric quadratic forms u T Au and u T Bu, where u T 
stands for the row matrix (x y z), and denote by u" those column matrices 
that satisfy 

Bu" = (E9.1) 

in which n is a label and the A„ are real, non-zero and all different. 

(a) By multiplying (E9.1) on the left by (u'") T and the transpose of the corre- 
sponding equation for u'" on the right by u", show that (u'") T Au" = 0 for 
n f m. 

(b) By noting that Au" = (2„) _1 Bu", deduce that (u m ) T Bu" = 0 for m f n. 

It can be shown that the u" are linearly independent; the next step is to 
construct a matrix P whose columns are the vectors u". 

(c) Make a change of variables u = Pv such that u T Au becomes v T Cv, and u T Bu 
becomes v T Dv. Show that C and D are diagonal by showing that cy = 0 if 
i f j and similarly for djj. 

Thus u = Pv or v = P _1 u reduces both quadratics to diagonal form. 

To summarise, the method is as follows: 

(a) find the A„ that allow (E9.1) a non-zero solution, by solving |B — zA| =0; 

(b) for each a„ construct u"; 

(c) construct the non-singular matrix P whose columns are the vectors u"; 

(d) make the change of variable u = Pv. 

9.7 (It is recommended that the reader does not attempt this question until exercise 9.6 
has been studied.) 

If, in the pendulum system studied in section 9.1, the string is replaced by a 
second rod identical to the first then the expressions for the kinetic energy T and 
the potential energy V become (to second order in the 0,) 
r«M/ 2 (|6»2 + 2 6l 1 6) 2 + |02), 

E~Mg/(§0 2 + i0 2 ). 

Determine the normal frequencies of the system and find new variables £ and q 
that will reduce these two expressions to diagonal form, i.e. to 

aiC 2 + azq 2 and hit; 2 + biq 2 . 
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9.8 (It is recommended that the reader does not attempt this question until exercise 9.6 
has been studied.) 

Find a real linear transformation that simultaneously reduces the quadratic 
forms 

3.x 2 + 5y 2 + 5z 2 + 2 yz + 6 zx — 2 xy, 

5.x 2 + 12y 2 + 8yz + 4 zx 

to diagonal form. 

9.9 Three particles of mass m are attached to a light horizontal string having fixed 
ends, the string being thus divided into four equal portions of length a each 
under a tension T. Show that for small transverse vibrations the amplitudes x 1 
of the normal modes satisfy Bx = (mam 2 / T)x, where B is the matrix 

2-10 
-1 2 -1 
0-12 

Estimate the lowest and highest eigenfrequencies using trial vectors (3 4 3) T 
and (3 —4 3) T . Use also the exact vectors ^1 ^2 1^ and ^1 — 1^ 

and compare the results. 

9.10 Use the Rayleigh-Ritz method to estimate the lowest oscillation frequency of a 
heavy chain of N links, each of length a (= L/N), which hangs freely from one 
end. (Try simple calculable configurations such as all links but one vertical, or 
all links collinear, etc.) 


9.5 Hints and answers 

9.1 See figure 9.6. 

9.2 K.E. = (l/2)mi 2 [(l + a )6 2 + a<j) 2 + lade /)] ; RE. = (l/2)mg/[(l + a.)6 2 + a (f> 2 ]. For 
a = 1/3 and to = sflg ~/l,4> = —2d and the mid-point of BC remains vertically 
below A. 

9.3 (b) x\ = e(cos cot + cos ^J2mt), x 2 = — ecos *J2mt, x 3 = e(— cosorf + cos^2mt). 
At various times the three displacements will reach 2e, e, 2e respectively. For exam- 
ple, xi can be written as 2e cos[(^/2— l)mt/2] cos[(^2 + l)cof/2], i.e. an oscillation 
of angular frequency ( s /2+l)m/2 and modulated amplitude 2e cos[(^2— l)co/2] ; 
the amplitude will reach 2e after a time « 4zr / [ct> ( ^/2 — 1)]. 

9.4 Taking separate loops in the left-hand and right-hand sides of the diagram 
the relevant matrices are A = (L^OjO.Lt) and B = (2C~\— C -1 ;— C _1 ,2C _1 ). 
Whatever the loop choice, co 2 must satisfy L\L 2 C 2 w A — 2(L\ + L 2 )Cm 2 + 3 = 0, 
which leads to the stated result. The energy stored in the central capacitor is 
( Qi — Q?) 1 / (2C). If Li = L 2 = L then one mode has to 2 = (LC) _1 and no current 
flows through the central capacitor. The other mode has co 2 = 3(LC)~' ; in this 
mode equal currents I (one clockwise, one anticlockwise) flow in the two loops 
and therefore the current through the central capacitor is 21 . 

9.5 As the circuit loops contain no voltage sources the equations are homogeneous 
and so for a non-trivial solution the determinant of coefficients must vanish. 

(a) I\ = 0, 1 2 = — h', no current in PQ; capacitance C/2 and inductance 2 L. 

(b) 1 1 = —2/ 7 = — 2/ 3 ; no current in TU ; capacitance 3C/2 and inductance 2 L. 

9.6 (a) Obtain (>» - 2 (m, )(u ,m) ) T Au(fi) = 0; (c) c, 7 = (P T AP), 7 = (P T ) ik A kl Pij = 
u^A k iu\^ = (u ,!) ) T Aub> = 0 for i /= j. 
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Figure 9.6 The normal modes, as viewed from above, of the coupled pendu- 
lums in example 9.1. 


9.7 co = (2.634 g/l) 1 ' 2 or (0.3661g//) 1/2 ; 0 t = £ + r\, 9 2 = 1.431£ - 2.097^. 

9.8 X = —1,2,4; x = 2£ — 2?/ + 2/, y = £, + r\ + /, z = — 3£ + r\ — %. 

9.9 Estimated, 10/17 < Mam 2 /T < 58/17; exact, 2 — < Mam 2 /T < 2 + y/2. 

9.10 The collinear case gives the best estimate, m 2 < 6n 2 g/(4n 2 a) as 3g/(2 /). 
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Vector calculus 


In chapter 7 we discussed the algebra of vectors, and in chapter 8 we considered 
how to transform one vector into another using a linear operator. In this chapter 
and the next we discuss the calculus of vectors, i.e. the differentiation and 
integration both of vectors describing particular bodies, such as the velocity of 
a particle, and of vector fields, in which a vector is defined as a function of the 
coordinates throughout some volume (one-, two- or three-dimensional). Since the 
aim of this chapter is to develop methods for handling multi-dimensional physical 
situations, we will assume throughout that the functions with which we have to 
deal have sufficiently amenable mathematical properties, in particular that they 
are continuous and differentiable. 


10.1 Differentiation of vectors 


Let us consider a vector a that is a function of a scalar variable u. By this 
we mean that with each value of u we associate a vector a (w). For example, in 
Cartesian coordinates a (u) = a x (u)i + a y (u ) j + a z (w)k, where a x (u), a y (u) and a z (u) 
are scalar functions of u and are the components of the vector a (w) in the x-, y- 
and z- directions respectively. We note that if a(w) is continuous at some point 
u — wo then this implies that each of the Cartesian components a x (u), a y (u ) and 
a z { u) is also continuous there. 

Let us consider the derivative of the vector function a(u) with respect to u. 
The derivative of a vector function is defined in a similar manner to the ordinary 
derivative of a scalar function f(x) given in chapter 2. The small change in 
the vector a(w) resulting from a small change Aw in the value of u is given by 
Aa = a(w + Aw) — a(w) (see figure 10.1). The derivative of a(w) with respect to w is 
defined to be 


da a(w + Aw) — a(w) 

du Au— >o Aw 


( 10 . 1 ) 
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Figure 10.1 A small change in a vector a («) resulting from a small change 
in u. 


assuming that the limit exists, in which case a (u) is said to be differentiable at 
that point. Note that da/du is also a vector, which is not, in general, parallel to 
a(n). In Cartesian coordinates, the derivative of the vector a(n) = a x i + a y j + «-k 
is given by 


da 

du 


da x . 
du 


da y 
du ' 


da , , 


Perhaps the simplest application of the above is to finding the velocity and 
acceleration of a particle in classical mechanics. If the time-dependent position 
vector of the particle with respect to the origin in Cartesian coordinates is given 
by r(f) = x(f)i + y(t)j + z(f)k then the velocity of the particle is given by the vector 


v(t) = 


dr 

dr 


dx. dy . dz , 
dl dt 3 dt 


The direction of the velocity vector is along the tangent to the path r(f) at the 
instantaneous position of the particle, and its magnitude |v(f)| is equal to the 
speed of the particle. The acceleration of the particle is given in a similar manner 
by 


a(f) = 


d\ 

dt 


d 2 x. dry. d 2 z, 

; j i_; _i t 

dt 2 df- S dt 2 


>-The position vector of a particle at time t in Cartesian coordinates is given by r(f) = 
2f 2 i + (3f — 2)j + (3t 2 — l)k. Find the speed of the particle at t = 1 and the component of 
its acceleration in the direction s = i + 2j + k. 


The velocity and acceleration of the particle are given by 


v(f) 

a(f) 


dr 

dt 

d\ 

dt 


= 4fi + 3j + 6fk, 


= 4i + 6k. 
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Figure 10.2 Unit basis vectors for two-dimensional Cartesian and plane polar 
coordinates. 


The speed of the particle at t = 1 is simply 

| v( 1 ) | = a/4 2 + 3 2 + 6 2 = V6L 

The acceleration of the particle is constant (i.e. independent of f), and its component in 
the direction s is given by 


„ (4i + 6k) • (i + 2j + k) 5V6 

as = = — - — . ◄ 

Vl 2 + 2 2 + l 2 3 

Note that in the case discussed above i, j and k are fixed, time-independent 
basis vectors. This may not be true of basis vectors in general; when we are 
not using Cartesian coordinates the basis vectors themselves must also be dif- 
ferentiated. We discuss basis vectors for non-Cartesian coordinate systems in 
detail in section 10.10. Nevertheless, as a simple example, let us now consider 
two-dimensional plane polar coordinates p, (j). 

Referring to figure 10.2, imagine holding </> fixed and moving radially outwards, 
i.e. in the direction of increasing p. Let us denote the unit vector in this direction 
by e p . Similarly, imagine keeping p fixed and moving around a circle of fixed radius 
in the direction of increasing </>. Let us denote the unit vector tangent to the circle 
by e^. The two vectors e p and e^, are the basis vectors for this two-dimensional 
coordinate system, just as i and j are basis vectors for two-dimensional Cartesian 
coordinates. All these basis vectors are shown in figure 10.2. 

An important difference between the two sets of basis vectors is that, while 
i and j are constant in magnitude and direction, the vectors e p and e^, have 
constant magnitudes but their directions change as p and (/> vary. Therefore, 
when calculating the derivative of a vector written in polar coordinates we must 
also differentiate the basis vectors. One way of doing this is to express e p and e^, 
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in terms of i and j. From figure 10.2, we see that 


e p = cos^i + sin</>j, 
e^, = — sin (p i + cos $ j. 


Since i and j are constant vectors, we find that the derivatives of the basis vectors 
t p and e^, with respect to f are given by 


■ cos 


i — sin ( 


dth . 
, df> 


de„ . , d(h . 

-J- = — sin 0—i- 

dt dt 

dtA, dd> . 

= cos (p —— i ,, , 
dt dt dt 

where the overdot is the conventional notation for differentiation with respect to 

time. 


-J = -<?e 


p > 


( 10 . 2 ) 

(10.3) 


► 77ie position vector of a particle in plane polar coordinates is r(t) = p(t)e p . Find expres- 
sions for the velocity and acceleration of the particle in these coordinates. 

Using result (10.4) below, the velocity of the particle is given by 
v(f) = f(t) = pe p + pe p = pe p + pft^,, 
where we have used (10.2). In a similar way its acceleration is given by 

a(t) = jipl P +pfa) 

= p i p + p e p + pf + pf + pf e# 

= pe p + p(f^) + pf(—fe p ) + pfe,/, + pfe# 

= (P - pf 2 ) e p + (Pf + 2 pf) V ◄ 


Here we have used (10.2) and (10.3). 


10.1.1 Differentiation of composite vector expressions 


In composite vector expressions each of the vectors or scalars involved may be 
a function of some scalar variable u, as we have seen. The derivatives of such 
expressions are easily found using the definition (10.1) and the rules of ordinary 
differential calculus. They may be summarised by the following, in which we 
assume that a and b are differentiable vector functions of a scalar u and that f 
is a differentiable scalar function of u : 


d 

du 

d , 

du* 
d , 


da 

du 

db 

du 

db 

du 


—a, 

lu 

(10.4) 

d a 

du 

(10.5) 

d a 

- — x b; 
du 

(10.6) 
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the order of the factors in the terms on the RHS of (10.6) is, of course, just as 
important as it is in the original vector product. 


►A particle of mass m with position vector r relative to some origin 0 experiences a force 
F, which produces a torque (moment) T = r x F about 0. The angular momentum of the 
particle about 0 is given by L = r x im, where v is the particle’s velocity. Show that the 
rate of change of angular momentum is equal to the applied torque. 


The rate of change of angular momentum is given by 

dh d 

fh=dt (rXm ' ,) ’ 

Using (10.6) we obtain 


JL dr d 

—— = — x mv + r x — (my) 
dt dt dt 


= v x mv + r x — (mv) 
dt 

= 0 + r x F = T, 

where in the last line we use Newton's second law, namely F 


d(m\)/dt. ◄ 


(10.7) 


If a vector a is a function of a scalar variable s that is itself a function of u, so 
that s = s(w), then the chain rule (see subsection 2.1.3) gives 

da(s) ds da 

du du ds 

The derivatives of more complicated vector expressions may be found by repeated 
application of the above equations. 

One further useful result can be derived by considering the derivative 

—(a • a) = 2a • — • 
du du 

since a ■ a = a 2 , where a = |a|, we see that 


a ■ — = 0 if a is constant. (10.8) 

du 

In other words, if a vector a (u) has a constant magnitude as u varies then it is 
perpendicular to the vector da/ du. 


10.1.2 Differential of a vector 

As a final note on the differentiation of vectors, we can also define the differential 
of a vector, in a similar way to that of a scalar in ordinary differential calculus. 
In the definition of the vector derivative (10.1), we used the notion of a small 
change Aa in a vector a (m) resulting from a small change Aw in its argument. In 
the limit Am — * 0, the change in a becomes infinitesimally small, and we denote it 
by the differential da. From (10.1) we see that the differential is given by 

da 

da = — du. (10.9) 

du 


344 




10.2 INTEGRATION OF VECTORS 


Note that the differential of a vector is also a vector. As an example, the 
infinitesimal change in the position vector of a particle in an infinitesimal time dt is 

dr = — dt = v dt, 
dt 

where v is the particle’s velocity. 


10.2 Integration of vectors 

The integration of a vector (or of an expression involving vectors that may itself 
be either a vector or scalar) with respect to a scalar u can be regarded as the 
inverse of differentiation. We must remember, however, that 


(i) the integral has the same nature (vector or scalar) as the integrand, 

(ii) the constant of integration for indefinite integrals must be of the same 
nature as the integral. 


For example, if a (u) = d[A(u)]/du then the indefinite integral of a(w) is given by 


a (u) du 


A (it) + b, 


where b is a constant vector. The definite integral of a(u) from u = u\ to u = uj 
is given by 


a.(u)du = A(« 2 ) — A(«i). 


►A small particle of mass m orbits a much larger mass M centred at the origin 0. According 
to Newton's law of gravitation, the position vector r of the small mass obeys the differential 
equation 



GMm 


Show that the vector r x dr/dt is a constant of the motion. 


Forming the vector product of the differential equation with r, we obtain 

d 2 r GM 

r x — = r-r x r. 

dt 2 r 2 

Since r and r are collinear, r x r = 0 and therefore we have 


However, 


r x 


d 2 r 
dt 2 


= 0 . 



= r x 


d 2 r dr dr 

tt + ~r x ~r = O' 
dt- dt dt 


( 10 . 10 ) 
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Figure 10.3 The unit tangent t, normal n and binormal b to the space curve C 
at a particular point P. 


since the first term is zero by (10.10), and the second is zero because it is the vector product 
of two parallel (in this case identical) vectors. Integrating, we obtain the required result 

r x ^ = c, (10.11) 


where c is a constant vector. 

As a further point of interest we may note that in an infinitesimal time dt the change 
in the position vector of the small mass is dr and the element of area swept out by the 
position vector of the particle is simply dA = \ |r x dr\. Dividing both sides of this equation 
by dt, we conclude that 


dA 1 dr 

r x 

dt 2 dt 


2 ’ 


and that the physical interpretation of the above result (10.11) is that the position vector r 
of the small mass sweeps out equal areas in equal times. This result is in fact valid for 
motion under any force that acts along the line joining the two particles. ◄ 


10.3 Space curves 

In the previous section we mentioned that the velocity vector of a particle is a 
tangent to the curve in space along which the particle moves. We now give a more 
complete discussion of curves in space and also of the geometrical interpretation 
of the vector derivative. 

A curve C in space can be described by the vector r(u) joining the origin O of 
a coordinate system to a point on the curve (see figure 10.3). As the parameter it 
varies, the end-point of the vector moves along the curve. In Cartesian coordinates, 

r(n) = x(u)i + y(u ) j + z(u)k, 

where x = x(w), y = y(u) and z = z(u) are the parametric equations of the curve. 
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This parametric representation can be very useful, particularly in mechanics when 
the parameter may be the time f. We can, however, also represent a space curve 
by y = f(x), z = g(x), which can be easily converted into the above parametric 
form by setting u = x, so that 

r (u) = wi + /(«)j + g(u)k. 

Alternatively, a space curve can be represented in the form F(x,y,z) = 0, 
G(x,y,z) = 0, where each equation represents a surface and the curve is the 
intersection of the two surfaces. 

A curve may sometimes be described in parametric form by the vector r(s), 
where the parameter s is the arc length along the curve measured from a fixed 
point. Even when the curve is expressed in terms of some other parameter, it is 
straightforward to find the arc length between any two points on the curve. For 
the curve described by r (u), let us consider an infinitesimal vector displacement 

dr = dx i + dy j + dz k 

along the curve. The square of the infinitesimal distance moved is then given by 
(■ ds ) 2 = dr ■ dr = (dx) 2 + (dy) 2 + (dz) 2 . 


from which it can be shown that 


Therefore, the arc length between two points on the curve r(w), given by u = iq 
and u = U 2 , is 


ds\ 


dr dr 


du ) du du 


s 




( 10 . 12 ) 


► A curve lying in the xy-plane is given by y = y(x), z = 0. Using ( 10.12), show that the 
arc length cdong the curve between x = a and x = b is given by s = J’j’ a/ 1 + y' 2 dx, where 
y' = dy / dx. 


Let us first represent the curve in parametric form by setting u = x, so that 

r(u) = ui + ,v(u)j. 

Differentiating with respect to u, we find 


dr dy . 

du du’ 


from which we obtain 


dr dr 


-^■- = 1+ -if 


du du 


dy 


du 
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Therefore, remembering that u 
is given by 



x, from (10.12) the arc length between x = a and x = b 


dr 

du 





dx. 


This result was derived using more elementary methods in chapter 2. ◄ 


If a curve C is described by r (u) then, by considering figures 10.1 and 10.3, we 
see that, at any given point on the curve, dr/du is a vector tangent to C at that 
point, in the direction of increasing u. In the special case where the parameter u 
is the arc length s along the curve then dr/ds is a unit tangent vector to C and is 
denoted by t. 

The rate at which the unit tangent t changes with respect to s is given by 
dt/ds, and its magnitude is defined as the curvature k of the curve C at a given 
point. 


dt 


df f 

ds 


ds 2 


We can also define the quantity p = 1 /k, which is called the radius of curvature. 

Since t is of constant (unit) magnitude, it follows from (10.8) that it is perpen- 
dicular to dt/ds. The unit vector in the direction perpendicular to t is denoted 
by n and is called the principal normal at the point. We therefore have 


— = /cn. (10.13) 

ds 

The unit vector b = t x n, which is perpendicular to the plane containing t 
and n, is called the binormal to C. The vectors t, n and b form a right-handed 
rectangular cooordinate system (or triad ) at any given point on C (see figure 10.3). 
As s changes so that the point of interest moves along C, the triad of vectors also 
changes. 

The rate at which b changes with respect to s is given by db/ds and is a 
measure of the torsion x of the curve at any given point. Since b is of constant 
magnitude, from (10.8) it is perpendicular to db/ds. We may further show that 

A /V A 

db/ds is also perpendicular to t, as follows. By definition b • t = 0, which on 
differentiating yields 



db a * dt 

ds + ds 

d b a a A 
— • t + b • k n 
ds 


db 

ds 


l 


where we have used the fact that b n = 0. Hence, since db/ds is perpendicular 
to both b and t, we must have db/ds oc n. The constant of proportionality is —x, 
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so we finally obtain 


d b 

ds 


—in. 


(10.14) 


Taking the dot product of each side with n, we see that the torsion of a curve is 
given by 

A d b 


T = 


ds 


We may also define the quantity a = 1/t, which is called the radius of torsion. 
Finally, we consider the derivative d h/ds. Since n = b x t we have 


dh 

ds 


dh , 
Is X * ' 
— rn x t 

t b — k t. 


a dt 
* 

bxrn 


(10.15) 


In summary, t, n and b and their derivatives with respect to s are related to one 
another by the relations (10.13), (10.14) and (10.15), the Frenet Serret formulae, 

dt A dh a a dh a 

— = rn, — = t b — k t, — = -m ( 10 . 16 ) 

ds ds ds 


► S/iow that the acceleration of a particle travelling along a trajectory r(f) is given by 


a(t) = 


dv a v 2 a 
-j- 1 4 n, 

dt p 


where v is the speed of the particle, t is the unit tangent to the trajectory, n is its principal 
normal and p is its radius of curvature. 


The velocity of the particle is given by 


^ dr dr ds ds * 

dt ds dt dt 


where ds/dt is the speed of the particle, which we denote by v, and t is the unit vector 
tangent to the trajectory. Writing the velocity as v = vt, and differentiating once more 
with respect to time t, we obtain 


but we note that 


Therefore, we have 


d\ dv a dt 

a(f) = — - = — t + r — ; 
dt dt dt 


dt 

dt 


ds dt a v a 
— — = vk n = - n. 
dt ds p 


dv a v 2 a 
a (f) = 77 t + — n. 

dt p 


This shows that in addition to an acceleration dv/dt along the tangent to the particle’s 
trajectory, there is also an acceleration v 2 /p in the direction of the principal normal. The 
latter is often called the centripetal acceleration. ◄ 
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Finally, we note that a curve r(n) representing the trajectory of a particle may 
sometimes be given in terms of some parameter u that is not necessarily equal to 
the time t but is functionally related to it in some way. In this case the velocity 
of the particle is given by 

dr dr du 

dt du dt 

Differentiating again with respect to time gives the acceleration as 

d\ d f dr du\ d 2 r / du 

dt dt \du dt J du 2 \dt 


dr d 2 u 
du dt 2 


10.4 Vector functions of several arguments 


The concept of the derivative of a vector is easily extended to cases where the 
vectors (or scalars) are functions of more than one independent scalar variable, 
ui,U 2 ,...,u n . In this case, the results of subsection 10.1.1 are still valid, except 
that the derivatives become partial derivatives da/d«; defined as in ordinary 
differential calculus. For example, in Cartesian coordinates, 

d a 8a x . 8a y , 8a z 
du 8u + du. du 

In particular, (10.7) generalises to the chain rule of partial differentiation discussed 
in section 5.5. If a = a(«i,U 2 , and each of the m, is also a function 
Ui{v i,V 2 ,-..,v„) of the variables iy then, generalising (5.17), 


8 a 

dt>i 


d a 8u\ da du 2 
dui dvj du2 dvj 


d a du n ^ da d u j 

du„ dvi ^ du, dvi 
l=i 1 


(10.17) 


A special case of this rule arises when a is an explicit function of some variable 
v, as well as of scalars ui,U 2 ,...,u n that are themselves functions of v; then we 
have 


da da da d«; 

— = b > 

dv dv ' du, dv 
l=i J 


(10.18) 


We may also extend the concept of the differential of a vector given in (10.9) 
to vectors dependent on several variables ui,U 2 ,...,u„: 


, da da , 

da = — — du i + — — du 2 
du\ du 2 


da da 

g^ du « = 2^e^. du J- 

1=1 1 


(10.19) 


As an example, the infinitesimal change in an electric field E in moving from a 
position r to a neighbouring one r + dr is given by 


dE = 


dE, dE , dE , 
— dx + — dv + — dz. 
dx dy dz 


( 10 . 20 ) 
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Figure 10.4 The tangent plane T to a surface S at a particular point P ; 
u = ci and v = C 2 are the coordinate curves, shown by dotted lines, that pass 
through P. The broken line shows some particular parametric curve r = r(A) 
lying in the surface. 


10.5 Surfaces 

A surface S in space can be described by the vector r(u,v) joining the origin 0 of 
a coordinate system to a point on the surface (see figure 10.4). As the parameters 
u and v vary, the end-point of the vector moves over the surface. This is very 
similar to the parametric representation r(n) of a curve, discussed in section 10.3, 
but with the important difference that we require two parameters to describe a 
surface, whereas we need only one to describe a curve. 

In Cartesian coordinates the surface is given by 

r(u,v) = x(u,v) i + y(u,v) j + z(u,v) k, 

where x = x(u, v), y = y(u, v) and z = z(w, v) are the parametric equations of the 
surface. We can also represent a surface by z = f(x,y) or g(x,y,z) = 0. Either 
of these representations can be converted into the parametric form in a similar 
manner to that used for equations of curves. For example, if z = f(x,y) then by 
setting u = x and v — y the surface can be represented in parametric form by 

r(u, v) = wi + v j + f(u, r)k. 

Any curve r(A), where A is a parameter, on the surface S can be represented 
by a pair of equations relating the parameters u and v, for example u = /(A) 
and v = g(A). A parametric representation of the curve can easily be found by 
straightforward substitution, i.e. r(A) = r(n(A), u(A)). Using (10.17) for the case 
where the vector is a function of a single variable A so that the LHS becomes a 
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total derivative, the tangent to the curve r(2) at any point is given by 


dr 

dX 


dr du 
8u dX 


dr dv 
dv dX 


( 10 . 21 ) 


The two curves u = constant and v — constant passing through any point P 
on S are called coordinate curves. For the curve u = constant, for example, we 
have du/dX = 0, and so from (10.21) its tangent vector is in the direction dr/dv. 
Similarly, the tangent vector to the curve v = constant is in the direction dr/du. 

If the surface is smooth then at any point P on S the vectors dr/du and 
dr/dv are linearly independent and define the tangent plane T at the point P (see 
figure 10.4). A vector normal to the surface at P is given by 


dr dr 

du dv 


( 10 . 22 ) 


In the neighbourhood of P , an infinitesimal vector displacement dr is written 


dr = 


dr , dr , 
— du + — dv. 
du dv 


The element of area at P, an infinitesimal parallelogram whose sides are the 
coordinate curves, has magnitude 


dS 


dr , dr , 


dr dr 

— du x — dv 
du dv 

— 

— x — 
du dv 


du dv = |n| du dv. 


(10.23) 


Thus the total area of the surface is 



du dv = 


|n| du dv. 


(10.24) 


where R is the region in the ur-plane corresponding to the range of parameter 
values that define the surface. 


► Find the element of area on the surface of a sphere of radius a. and hence calculate the 
total surface area of the sphere. 


We can represent a point r on the surface of the sphere in terms of the two parameters 9 
and (/: 

r(9, 4>) = a sin 9 cos cj> i + a sin 9 sin 0 j + a cos 6 k, 


where 6 and (j> are the polar and azimuthal angles respectively. At any point P, vectors 
tangent to the coordinate curves 6 = constant and 4> = constant are 


dr 

do 

dr 

d</> 


a cos 9 cos <j> i + a cos 9 sin <f> j — a sin 8 k, 
— a sin 9 sin 4> i + a sin 9 cos 4> j. 
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A normal n to the surface at this point is then given by 

' j k 

a cos 9 cos f a cos 9 sin f —a sin 9 
—a sin 9 sin (j> a sin 9 cos <j> 0 

= a 2 sin 0(sin 9 cos cj> i + sin 9 sin <fi j + cos 9 k), 
which has a magnitude of a 2 sin 9. Therefore, the element of area at P is, from ( 10.23), 

dS = a 2 sin 9 d9 d<f>, 

and the total surface area of the sphere is given by 

nn n2n 

A= d9 dtp a 2 sin 9 = 4n a 2 . 

Jo Jo 

This familiar result can, of course, be proved by much simpler methods ! ◄ 


5 1 8 r 

89 X 8f~ 


10.6 Scalar and vector fields 

We now turn to the case where a particular scalar or vector quantity is defined 
not just at a point in space but continuously as a field throughout some region 
of space R (which is often the whole space). Although the concept of a field is 
valid for spaces with an arbitrary number of dimensions, in the remainder of this 
chapter we will restrict our attention to the familiar three-dimensional case. A 
scalar field 4>(x,y,z ) associates a scalar with each point in R , while a vector field 
a (x,y,z) associates a vector with each point. In what follows, we will assume that 
the variation in the scalar or vector field from point to point is both continuous 
and differentiable in R. 

Simple examples of scalar fields include the pressure at each point in a fluid 
and the electrostatic potential at each point in space in the presence of an electric 
charge. Vector fields relating to the same physical systems are the velocity vector 
in a fluid (giving the local speed and direction of the flow) and the electric field. 

With the study of continuously varying scalar and vector fields there arises the 
need to consider their derivatives and also the integration of field quantities along 
lines, over surfaces and throughout volumes in the field. We defer the discussion 
of line, surface and volume integrals until the next chapter, and in the remainder 
of this chapter we concentrate on the definition of vector differential operators 
and their properties. 


10.7 Vector operators 

Certain differential operations may be performed on scalar and vector fields 
and have wide-ranging applications in the physical sciences. The most important 
operations are those of finding the gradient of a scalar field and the divergence 
and curl of a vector field. It is usual to define these operators from a strictly 
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mathematical point of view, as we do below. In the following chapter, however, we 
will discuss their geometrical definitions, which rely on the concept of integrating 
vector quantities along lines and over surfaces. 

Central to all these differential operations is the vector operator V, which is 
called del (or sometimes nabla) and in Cartesian coordinates is defined by 


„ . 8 .d , 8 

w = l Tx + % + k Tz 


(10.25) 


The form of this operator in non-Cartesian coordinate systems is discussed in 
sections 10.9 and 10.10. 


10.7.1 Gradient of a scalar field 

The gradient of a scalar field 0(x,y,z) is defined by 

grad^ = V^ = ig+j^+kg. (10.26) 

Clearly, V0 is a vector field whose x-, y- and z- components are the first partial 
derivatives of 0(x,y, z) with respect to x, y and z respectively. Also note that the 
vector field V0 should not be confused with the vector operator 0 V, which has 
components (0 d/dx, <j>8/dy, fid/dz). 


► Find the gradient of the scalar field 0 = xy 2 z 3 . 


Lrom (10.26) the gradient of <j> is given by 

V0 = y 2 z 3 i + 2xyz 3 j + 3xy 2 z 2 k. ◄ 


The gradient of a scalar field 0 has some interesting geometrical properties. 
Let us first consider the problem of calculating the rate of change of 0 in some 
particular direction. For an infinitesimal vector displacement dr, forming its scalar 
product with V0 we obtain 


v* •*=ih£+i|£ 


dx 


50 . 

= dx - 
ox 

= dtj), 


k d A 

5z 


(i dx + j dy + k dx) , 


50 50 

-T- dy + — dz, 
dy dz 


(10.27) 


which is the infinitesimal change in 0 in going from position r to r + dr. In 
particular, if r depends on some parameter u such that r(w) defines a space curve 
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Figure 10.5 Geometrical properties of V0. PQ gives the value of drfi/ds in 
the direction a. 


then the total derivative of (j> with respect to u along the curve is simply 

' 1028 ' 

in the particular case where the parameter u is the arc length s along the curve, 
the total derivative of tj) with respect to s along the curve is given by 


dip 

ds 


= V</> • t, 


(10.29) 


where t is the unit tangent to the curve at the given point, as discussed in 
section 10.3. 

In general, the rate of change of (ft with respect to the distance s in a particular 
direction a is given by 


deft 

ds 


= V0 • a 


(10.30) 


and is called the directional derivative. Since a is a unit vector we have 

^ = \V<ft\ cos 9 
ds 

where 0 is the angle between a and V(ft as shown in figure 10.5. Clearly V</> lies 
in the direction of the fastest increase in (ft, and |V0| is the largest possible value 
of d(f>/ds. Similarly, the largest rate of decrease of cj) is dcj>/ds = — |V<^| in the 
direction of —Wcf). 
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►For the function (f> = x 2 y + yz at the point (1,2,— 1), find its rate of change with distance 
in the direction a = i + 2j + 3k. At this same point, what is the greatest possible rate of 
change with distance and in which direction does it occur? 


The gradient of (j) is given by (10.26): 

V<j> = 2xyi + ( x 2 + z)j + yk, 

= 4i + 2k at the point (1,2, —1). 

The unit vector in the direction of a is a = -W(i + 2j + 3k), so the rate of change of f 
with distance s in this direction is, using (10.30), 


df 

ds 


Vcj) ■ a 


— =(4 + 6) 
\/14 


10 

V^ 


From the above discussion, at the point (1,2, — 1) df/ds will be greatest in the direction 
of V(f> = 4i + 2k and has the value |V<^| = y/20 in this direction. ◄ 


We can extend the above analysis to find the rate of change of a vector 
field (rather than a scalar field as above) in a particular direction. The scalar 
differential operator a • V can be shown to give the rate of change with distance 
in the direction a of the quantity (vector or scalar) on which it acts. In Cartesian 
coordinates it may be written as 

3 0 3 

a • V = a x — + a y — + a z — . (10.31) 

ox cy oz 

Thus we can write the infinitesimal change in an electric field in moving from r 
to r + dr given in (10.20) as dE = (dr • V)E. 

A second interesting geometrical property of Vfi may be found by considering 
the surface defined by cf>(x,y,z) = c, where c is some constant. If t is a unit 
tangent to this surface at some point then clearly dfi/ds = 0 in this direction 
and from (10.29) we have Vfi • t = 0. In other words, V</> is a vector normal to 
the surface (f>(x,y,z) = c at every point, as shown in figure 10.5. If n is a unit 
normal to the surface in the direction of increasing (j>(x,y,z ), then the gradient is 
sometimes written 

Vf = B -^h, (10.32) 

on 

where dfi/dn = |V</>| is the rate of change of 0 in the direction n and is called 
the normal derivative. 


>-Find expressions for the equations of the tangent plane and the line normal to the surface 
4>(x,y,z) = c at the point P with coordinates Xo,yo> z o- Use the results to find the equations 
of the tangent plane and the line normal to the surface of the sphere (j> = x 2 + y 2 + z 2 = a 2 
at the point (0,0, a). 


A vector normal to the surface </)(x,y,z) = c at the point P is simply evaluated at that 
point; we denote it by n 0 . If r 0 is the position vector of the point P relative to the origin, 
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Figure 10.6 The tangent plane and the normal to the surface of the sphere 
<f> = x 2 + y 2 + z 2 = a 2 at the point r 0 with coordinates (0, 0, a). 


and r is the position vector of any point on the tangent plane, then the vector equation of 
the tangent plane is, from (7.41), 

(r-r 0 ) • n 0 = 0 . 

Similarly, if r is the position vector of any point on the straight line passing through P 
(with position vector ro) in the direction of the normal n 0 then the vector equation of this 
line is, from subsection 7.7.1, 

(r - r 0 ) x n 0 = 0. 

For the surface of the sphere (f> = x 2 + y 2 + z 2 = a 2 , 

V</> = 2xi + 2yj + 2zk 

= 2ak at the point (0,0, a). 

Therefore the equation of the tangent plane to the sphere at this point is 

(r — ro) • 2ak = 0. 

This gives 2a(z — a) = 0 or z = a, as expected. The equation of the line normal to the 
sphere at the point (0, 0, a) is 

(r — ro) x 2ak = 0, 

which gives 2ayi — 2axj = 0 or x = y = 0, i.e. the z-axis, as expected. The tangent plane 
and normal to the surface of the sphere at this point are shown in figure 10.6. ◄ 

Further properties of the gradient operation, which are analogous to those of 
the ordinary derivative, are listed in subsection 10.8.1 and may be easily proved. 
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In addition to these, we note that the gradient operation also obeys the chain 
rule as in ordinary differential calculus, i.e. if cp and rp are scalar fields in some 
region R then 

v [<£(v>)] = p-vv- 


10.7.2 Divergence of a vector field 


The divergence of a vector field a (x,y,z) is defined by 

da x da v 8a z 

diva = V ■ a = — + — ± + — , 

ex cy cz 


(10.33) 


where a x , a y and a z are the x-, y- and z- components of a. Clearly, V • a is a scalar 
field. Any vector field a for which V • a = 0 is said to be solenoidal. 


► Find the divergence of the vector field a = x 2 y 2 i + y 2 z 2 j + x 2 z 2 k. 

Lrom (10.33) the divergence of a is given by 

V ■ a = 2 xy 2 + 2 yz 2 + 2 x 2 z = 2(xy 2 + yz 2 + x 2 z). ◄ 


We will discuss fully the geometric definition of divergence and its physical 
meaning in the next chapter. For the moment, we merely note that the divergence 
can be considered as a quantitative measure of how much a vector field diverges 
(spreads out) or converges at any given point. For example, if we consider the 
vector field v(x, y,z) describing the local velocity at any point in a fluid then V • v 
is equal to the net rate of outflow of fluid per unit volume, evaluated at a point 
(by letting a small volume at that point tend to zero). 

Now if some vector field a is itself derived from a scalar field via a = V</> then 
V • a has the form V • Vcj> or, as it is usually written, V 2 (/>, where V 2 (del squared) 
is the scalar differential operator 


d 2 d 2 d 2 

dx 2 dy 2 8z 2 


(10.34) 


V 2 4> is called the Laplacian of < p and appears in several important partial differ- 
ential equations of mathematical physics, discussed in chapters 18 and 19. 


►Find the Laplacian of the scalar field cp = xy~z } . 


From (10.34) the Laplacian of <p is given by 


8 2 <p 8 2 tj> 8 2 4 > , , 

‘=8^ + W + d^ = +6xyz * 
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10.7.3 Curl of a vector field 

The curl of a vector field a (x,y,z) is defined by 


curl a = V x a = 




( 8a x da : \ ( da y 8a x \ 

Ur - + -w) k - 


where a x , a y and a z are the x-, y- and z- components of a. The RHS can be 
written in a more memorable form as a determinant: 


i j k 

AAA 

dx dy dz 

dy^ dy dg 


(10.35) 


where it is understood that, on expanding the determinant, the partial derivatives 
in the second row act on the components of a in the third row. Clearly, V x a 
is itself a vector field. Any vector field a for which V x a = 0 is said to be 
irrotational. 


>-Find the curl of the vector field a = x 2 y 2 z 2 i + y 2 z 2 j + x 2 ~ 2 k. 


The curl of a is given by 


V<j> = 


i J k 

AAA 

dx dy dz 


= —2 [y 2 zi + ( xz 2 — x 2 y 2 z ) j + x 2 yz 2 k] . ◄ 


For a vector field v(x,y,z) describing the local velocity at any point in a fluid, 
V x v is a measure of the angular velocity of the fluid in the neighbourhood of 
that point. If a small paddle wheel were placed at various points in the fluid then 
it would tend to rotate in regions where V x v f 0, while it would not rotate in 
regions where V x v = 0. 

Another insight into the physical interpretation of the curl operator is gained 
by considering the vector field v describing the velocity at any point in a rigid 
body rotating about some axis with angular velocity co. If r is the position vector 
of the point with respect to some origin on the axis of rotation then the velocity 
of the point is given by v = m x r. Without any loss of generality, we may take 
oj to lie along the z-axis of our coordinate system, so that m = to k. The velocity 
field is then v = — coy i + roxj. The curl of this vector field is easily found to be 

i j k 

8 d 8 

8x 8y dz 

—my mx 0 


V x v = 
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V(0 + ip) = V0 + Vi p 
V(a + b) = Va + Vb 
Vx(a + b) = Vxa + Vxb 
V(0ip) = 0Vip + ipV<t> 

V(a ■ b) = a x (V x b) + b x (V x a) + (a ■ V)b + (b ■ V)a 
V ■ (0a) = 0V ■ a + a ■ V0 
V ■ (a x b) = b ■ (V x a) — a ■ (V x b) 

V x (0a) = V0 x a + 0V x a 
V x (a x b) = a(V ■ b) - b(V ■ a) + (b ■ V)a - (a ■ V)b 


Table 10.1 Vector operators acting on sums and products. The operator V is 
defined in (10.25); 0 and ip are scalar fields, a and b are vector fields. 


Therefore the curl of the velocity field is a vector equal to twice the angular 
velocity vector of the rigid body about its axis of rotation. We give a full 
geometrical discussion of the curl of a vector in the next chapter. 


10.8 Vector operator formulae 

In the same way as for ordinary vectors (chapter 7), for vector operators certain 
identities exist. In addition, we must consider various relations involving the 
action of vector operators on sums and products of scalar and vector fields. Some 
of these relations have been mentioned above, but we list all the most important 
ones here for convenience. The validity of these relations may be easily verified 
by direct calculation (a quick method of deriving them using tensor notation is 
given in chapter 21). 

Although some of the following vector relations are expressed in Cartesian 
coordinates, it may be proved that they are all independent of the choice of 
coordinate system. This is to be expected since grad, div and curl all have clear 
geometrical definitions, which are discussed more fully in the next chapter and 
which do not rely on any particular choice of coordinate system. 


10.8.1 Vector operators acting on sums and products 

Let 0 and t p be scalar fields and a and b be vector fields. Assuming these fields 
are differentiable, the action of grad, div and curl on various sums and products 
of them is presented in table 10.1. 

These relations can be proved by direct calculation. 
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► S/jow that 


V x (4>a) = V(j) x a + </>V x a. 


The x-component of the LHS is 


d_ 

dy 


(<Mz) 


8 IM \ A. Sa z , # 

— ( <K ) = <P-^~ + —th - - -z-a y 

cz dy dy dz dz 


, / 8a z 8a„ 

= r/j(V x a) x + (V(/j x a). 


8$ dcj) 

+ { ^ a =-JI a > 


where, for example, (Vc/> x a) x denotes the x-component of the vector V<j> x a. Incorporating 
the y- and z- components, which can be similarly found, we obtain the stated result. ◄ 


Some useful special cases of the relations in table 10.1 are worth noting. If r is 
the position vector relative to some origin and r = |r|, then 

V 0(r ) = d j-r, 
dr 

V- [<p(r)r] = 3 </>(r) + 

dr 

dr - r dr 

V x [(/>(;• )r] = 0. 


These results may be proved straightforwardly using Cartesian coordinates but 
far more simply using spherical polar coordinates, which are discussed in subsec- 
tion 10.9.2. Particular cases of these results are 


Vr = r, V • r = 3, V x r = 0, 

together with 



v ■ (£ ) = -? 2 (7) = ‘“w 

where <5(r) is the Dirac delta function, discussed in chapter 13. The last equation is 
important in the solution of certain partial differential equations and is discussed 
further in chapter 18. 


10.8.2 Combinations of grad, div and curl 

We now consider the action of two vector operators in succession on a scalar or 
vector field. We can immediately discard four of the nine obvious combinations of 
grad, div and curl, since they clearly do not make sense. If </> is a scalar field and 
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a is a vector field, these four combinations are grad(grad </>), div(div a), curl(div a) 
and grad(curl a). In each case the second (outer) vector operator is acting on the 
wrong type of field, i.e. scalar instead of vector or vice versa. In grad(grad </;), 
for example, grad acts on grad cf>, which is a vector field, but we know that grad 
only acts on scalar fields (although in fact we will see in chapter 21 that we can 
form the outer product of the del operator with a vector to form a tensor, but 
that need not concern us here). 

Of the five valid combinations of grad, div and curl, two are identically zero, 
namely 


curl grad </> = V x V</> = 0, (10.37) 

div curia = V • (V x a) = 0. (10.38) 


From (10.37), we see that if a is derived from the gradient of some scalar function 
such that a = V</> then it is necessarily irrotational (V x a = 0). We also note 
that if a is an irrotational vector field then another irrotational vector field is 
a + V(/> + c, where tj> is any scalar field and c is a constant vector. This follows 
since 

V x (a + V</> + c) = V x a + V x V0 = 0. 


Similarly, from (10.38) we may infer that if b is the curl of some vector field a 
such that b = V x a then b is solenoidal (V • b = 0). Obviously, if b is solenoidal 
and c is any constant vector then b + c is also solenoidal. 

The three remaining combinations of grad, div and curl are 


div grad 4> = V • V0 = V 2 <?) = 


8 (i) 2 <f> 

fa 2 


d 2 4> 

dy 2 


grad diva = V(V • a), 

'8 2 a x 


d 2 a v d 2 a-, 

+ 


8x 2 ' dxdy ' dxdz 
( d 2 a x 8 2 a y d 2 a z 

\<3z<3x 8z8y dz 2 

curl curia = V x (V x a) = V(V • a) — 


+ 


d 2 <f> 

dz 2 ’ 

( 8 2 a x 
\ dydx 


k, 


V 2 a, 


(10.39) 


d 2 a y 8 2 a z \ . 
dy 2 8ydz ) * 

(10.40) 

(10.41) 


where (10.39) and (10.40) are expressed in Cartesian coordinates. In (10.41), the 
term V 2 a has the linear differential operator V 2 acting on a vector (as opposed to 
a scalar as in (10.39)), which of course consists of a sum of unit vectors multiplied 
by components. Two cases arise. 


(i) If the unit vectors are constants (i.e. they are independent of the values of 
the coordinates) then the differential operator gives a non-zero contribution 
only when acting upon the coordinates, the unit vectors being merely 

multipliers. 
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(ii) If the unit vectors vary as the values of the coordinates change (i.e. are 
not constant in direction throughout the whole space) then the derivatives 
of these vectors appear as contributions to V 2 a. 

Cartesian coordinates are an example of the first case in which each component 
satisfies (V 2 a),- = V 2 a,-. In this case (10.41) can be applied to each component 
separately: 

[V x (V x a)], = [V(V • a)],- - V 2 a ; . (10.42) 

However, cylindrical and spherical polar coordinates come in the second class. 
For them (10.41) is still true, but the further step to (10.42) cannot be made. 

More complicated vector operator relations may be proved using the relations 
given above. 


►S/iow that 

where (j) and i p are scalar fields. 


V ■ ( V0 xV V ) = 0, 


From the previous section we have 

V ■ (a x b) = b ■ (V x a) — a ■ (V x b). 

If we let a = Vtf) and b = Vi/’ then we obtain 

V ■ (V0 x Vi ip) = Vi ip • (V x V<£) - V<t> • (V x Vi p) = 0, (10.43) 

since V x V0 = 0 = V x Vi p, from (10.37). ◄ 


10.9 Cylindrical and spherical polar coordinates 

The operators we have discussed in this chapter, i.e. grad, div, curl and V 2 , 
have all been defined in terms of Cartesian coordinates, but for many physical 
situations other coordinate systems are more natural. For example, many systems, 
such as an isolated charge in space, have spherical symmetry and spherical polar 
coordinates would be the obvious choice. For axisymmetric systems, such as fluid 
flow in a pipe, cylindrical polar coordinates are the natural choice. The physical 
laws governing the behaviour of the systems are often expressed in terms of 
the vector operators we have been discussing, and so it is necessary to be able 
to express these operators in these other, non-Cartesian, coordinates. We first 
consider the two most common non-Cartesian coordinate systems, i.e. cylindrical 
and spherical polars, and go on to discuss general curvilinear coordinates in the 
next section. 


10.9.1 Cylindrical polar coordinates 

As shown in figure 10.7, the position of a point in space P having Cartesian 
coordinates x,y,z may be expressed in terms of cylindrical polar coordinates 
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p,(p,z, where 

x = pcoscp, y — p sin cp, z = z, (10.44) 

and p > 0, 0 < cp < 2n and — oo < z < oo. The position vector of P may therefore 
be written 


r = p cos cp i + p sin cp j + z k. 


(10.45) 


If we take the partial derivatives of r with respect to p, cp and z respectively then 
we obtain the three vectors 


dr ... 

e„ = — = cos(/»i + sin <pj, 
dp 

dr . 

e </> = vr = — P sin0i + p cos cp], 
dcp 

dr 

e, = — = k. 

' dz 


(10.46) 

(10.47) 

(10.48) 


These vectors lie in the directions of increasing p, cp and z respectively but are 
not all of unit length. Although e p , e f /, and e_ form a useful set of basis vectors 
in their own right (we will see in section 10.10 that such a basis is sometimes the 
most useful), it is usual to work with the corresponding unit vectors, which are 
obtained by dividing each vector by its modulus to give 


e p = e p = cos cp i + sin cp j, (10.49) 

e,* = — e,i = — sin d>i + costh j, (10.50) 

P 

e z = e z =k. (10.51) 


These three unit vectors, like the Cartesian unit vectors i, j and k, form an 
orthonormal triad at each point in space, i.e. the basis vectors are mutually 
orthogonal and of unit length (see figure 10.7). Unlike the fixed vectors i, j and k, 
however, e p and t,j, change direction as P moves. 

The expression for a general infinitesimal vector displacement dr in the position 
of P is given, from (10.19), by 


, dr , dr TJ dr , 
dr = — dp + — dd) + — dz 
dp dtp dz 

— dp e p T dcf> T dz e z 
= dpt p +pd(j>e ( j ) + dz e z . 


(10.52) 


This expression illustrates an important difference between Cartesian and cylin- 
drical polar coordinates (or non-Cartesian coordinates in general). In Cartesian 
coordinates, the distance moved in going from x to x + dx, with y and z held 
constant, is simply ds — dx. However, in cylindrical polars, if cf> changes by dtp, 
with p and z held constant, then the distance moved is not dcp, but ds = pdcp. 
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Figure 10.7 Cylindrical polar coordinates p,(j>,z. 



Figure 10.8 The element of volume in cylindrical polar coordinates is given 
by p dp dc/> dz. 


Factors, such as the p in pdf, that multiply the coordinate differentials (in the 
orthonormal basis) to get distances are known as scale factors. From (10.52), the 
scale factors for the p-, <j>- and z- coordinates are therefore 1, p and 1 respectively. 
The magnitude ds of the displacement dr is given in cylindrical polar coordinates 

by 


(ds) 2 = dr ■ dr = (dp) 2 + p 2 (df) 2 + (dz) 2 , 

where in the second equality we have used the fact that the basis vectors are 
orthonormal. We can also find the volume element in a cylindrical polar system 
(see figure 10.8) by calculating the volume of the infinitesimal parallelepiped 
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VO 


V a 


V x a 


V 2 0 


50 

dp 


1 50 . 


( „ „ \ i 


50 , 

p 8(/) <!l ' 8z 
1 daj, da , 


£“/■ + 7X5®^ + XT®- 


_ 5 _ 

dp 


PU 

d_ 

d(j) 

pCt(j) 


e z 

d_ 

dz 

a z 


i d / aa>\ 1 d 2 0> d 2 <D 

p dp \ dp ) p 2 d(j) 2 dz 2 


Table 10.2 Vector operators in cylindrical polar coordinates; O is a scalar 
field and a is a vector field. 


defined by the vectors dp e p , p dtj> and dz e z ; this is given by 

dV = | dpe p ■ (p d(f> x dz e z )| = p dp d(f> dz, 

which again uses the fact that the basis vectors are orthonormal. For a simple 
coordinate system such as cylindrical polars the expressions for (ds) 2 and dV are 
obvious from the geometry. 

We will now express the vector operators discussed in this chapter in terms of 
cylindrical polar coordinates. Let us consider a scalar field (D(p, 4>,z), where we 
use <X> for the scalar field to avoid confusion with the azimuthal angle 0, and a 
vector field &{p,<f>,z). We must first write the vector field in terms of the basis 
vectors of the cylindrical polar coordinate system, i.e. 

a Cp + cifi -f Q z g z , 

where a p , and a z are the components of a in the p-, <f>- and z- directions 
respectively. The expressions for grad, div, curl and V 2 can then be calculated 
and are given in table 10.2. Since the derivations of these expressions are rather 
complicated we leave them until our discussion of general curvilinear coordinates 
in the next section; the reader could well postpone examination of these formal 
proofs until some experience of using the expressions has been gained. 

► Express the vector field a = yz i — y j + xz 2 k in cylindrical polar coordinates, and hence 
calculate its divergence. Show that the same result is obtained by evaluating the divergence 
in Cartesian coordinates. 


The basis vectors of the cylindrical polar coordinate system are given in ( 10.49) (10.51). 
Solving these equations simultaneously for i, j and k we obtain 

i = cos (f> — sin 4> 

j = sin 4> e p + cos 4> e^, 
k = e z . 
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z 



Figure 10.9 Spherical polar coordinates r,d,4>. 


Substituting these relations and (10.44) into the expression for a we find 

a = zp sin</>(cos (f>e p — sin 0e^,) — p sin 0(sin 0 e p + cos tj)^) + z 2 p cos <)ie- 
= (zp sin 4> cos <j> — psin 2 4>)e p — (zp sin 2 4> + P sin^cos 0) e^, + z 2 p cos (^e r . 
Substituting into the expression for V ■ a given in table 10.2, 

V ■ a = 2 z sin 0 cos (j> — 2 sin 2 <j> — 2z sin <j> cos (/> — cos 2 (/> + sin 2 0 + 2zp cos <j> 
= 2 zp cos <j) — 1. 


Alternatively, and much more quickly in this case, we can calculate the divergence 
directly in Cartesian coordinates. We obtain 


8a x 5a v 8a : 

V ■ a = 1- — + — = 2zx • 

ox oy oz 


1 , 


which on substituting x = p cos 4> yields the same result as the calculation in cylindrical 
polars. ◄ 


Finally, we note that similar results can be obtained for (two-dimensional) 
polar coordinates in a plane by omitting the z-dependence. For example, ( ds ) 2 = 
(dp) 2 + p 2 (d<f>) 2 , while the element of volume is replaced by the element of area 
dA = pdpdcj). 


10.9.2 Spherical polar coordinates 

As shown in figure 10.9, the position of a point in space P, with Cartesian 
coordinates x,y,z, may be expressed in terms of spherical polar coordinates 
r, 6, (f > , where 


x = r sin 6 cos y = r sin 0 sin cf>, z = rcos0, (10.53) 
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and r > 0, 0 < 8 < n and 0 < <p < 2n. The position vector of P may therefore be 
written as 

r = r sin 8 cos (/) i + r sin 8 sin cf> j + r cos 8 k. 

If, in a similar manner to that used in the previous section for cylindrical polars, 
we find the partial derivatives of r with respect to r, 8 and (j) respectively and 
divide each of the resulting vectors by its modulus then we obtain the unit basis 
vectors 


e r = sin 8 cos </> i + sin 8 sin </> j + cos 8 k, 
eg = cos 8 cos </> i + cos 8 sin cf> j — sin 8 k, 
e (j, = — sin (j) i + cos (j) j. 

These unit vectors are in the directions of increasing r, 8 and cj) respectively 
and are the orthonormal basis set for spherical polar coordinates, as shown in 
figure 10.9. 

A general infinitesimal vector displacement in spherical polars is, from (10.19), 

dr = dre r + r dd eg + r sin 8 dcj) e^; (10.54) 

thus the scale factors for the r-, 8- and <f>- coordinates are 1, r and rsind 
respectively. The magnitude ds of the displacement dr is therefore given by 

(ds) 2 = dr ■ dr = (dr) 2 + r 2 (dd) 2 + r 2 sin 2 0(d^)) 2 , 

since the basis vectors form an orthonormal set. The element of volume in 
spherical polar coordinates (see figure 10.10) is the volume of the infinitesimal 
parallelepiped defined by the vectors dre r , rdd&e and r sin 8 dcj) and is given 
by 

dV = \dr e r ■ (r dd eg x rsindd(/»e^)[ = r 2 sin 8 dr d8 d<j), 

where again we use the fact that the basis vectors are orthonormal. The expres- 
sions for (ds) 2 and dV in spherical polars can be obtained from the geometry of 
this coordinate system. 

We will now express the standard vector operators in spherical polar coordi- 
nates, using the same techniques as for cylindrical polar coordinates. We consider 
a scalar field <&(r,8,(j)) and a vector field a (r,6,(j)). The latter may be written in 
terms of the basis vectors of the spherical polar coordinate system as 

a = a r e,. + ag + a ^ e^,, 

where a r , ag and a $ are the components of a in the r-, 8- and </>- directions 
respectively. The expressions for grad, div, curl and V 2 are given in table 10.3. 
The derivations of these results are given in the next section. 
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V® = 

V a = 

V x a = 


V 2 ® 


5 ®. 1 3 ®„ 1 < 5 ®. 

T - e >' ^ T7T e ° ; — a 

dr r dv r sin 6 ocp 


J_ d_ 2 1 d 

r 2 dr^ r a ' rsinS 89 


(sin 0 flu) + 


1 da $ 

r sin 6 dcj> 


1 

e r 

8 

re 0 

8 

r sin 9 e^, 

8 


r 2 sin 9 

8r 

89 

8(j> 



a r 

ra 0 

r sin 9 


11L 

2 8_^ 

1 4 

1 8 ( ■ a 8 ® 

r 2 dr I 

8r ) 

i 2 

r l 

sin 6 d0 V 

89 


a 2 ® 


ti 2 9 8(j> 2 


Table 10.3 Vector operators in spherical polar coordinates. ® is a scalar field 
and a is a vector field. 



Figure 10.10 The element of volume in spherical polar coordinates is given 
by r 2 sin9drd9 d(j>. 

As a final note we mention that in the expression for V 2 ® given in table 10.3 
we can rewrite the first term on the RHS as follows: 

‘A = lfi (r ®), 

r 2 dr \ dr J r dr 2 

which can often be useful in shortening calculations. 
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10.10 General curvilinear coordinates 

As indicated earlier, the contents of this section are more formal and technically 
complicated than hitherto. The section could be omitted until the reader has had 
some experience of using its results. 

Cylindrical and spherical polars are just two examples of what are called 
general curvilinear coordinates. In the general case, the position of a point P 
having Cartesian coordinates x,y,z may be expressed in terms of the three 
curvilinear coordinates ui,u 2 ,u 3 , where 

x = x(u u u 2 , u 3 ), y = y(u u u 2 , u 3 ), z = z(u u u 2 , u 3 ), 

and similarly 

Mi = u 3 (x, y, z), u 2 = u 2 (x,y,z), u 3 = u 3 (x,y,z). 

We assume that all these functions are continuous, differentiable and have a 
single-valued inverse, except perhaps at or on certain isolated points or lines, 
so that there is a one-to-one correspondence between the x, y, z and ui, u 2 , u 3 
systems. The U\ -, u 2 - and u 3 - coordinate curves of a general curvilinear system 
are analogous to the x-, y- and z- axes of Cartesian coordinates. The surfaces 
mi = ci, m 2 = c 2 and u 3 = C 3 , where ci, c 2 , c 3 are constants, are called the 
coordinate surfaces and each pair of these surfaces has its intersection in a curve 
called a coordinate curve or line (see figure 10.11). If at each point in space the 
three coordinate surfaces passing through the point meet at right angles then 
the curvilinear coordinate system is called orthogonal. For example, in spherical 
polars Mi = r, u 2 = 0, u 3 = <j> and the three coordinate surfaces passing through 
the point (R, 0 , <I>) are the sphere r = R, the circular cone 0 = 0 and the plane 
(/> = <!>, which intersect at right angles at that point. Therefore spherical polars 
(and cylindrical polars) form an orthogonal coordinate system. 

If r(ui,u 2 ,u 3 ) is the position vector of the point P then ei = dr/dui is a vector 
tangent to the m-curve at P (for which u 2 and u 3 are constants) in the direction 
of increasing mi. Similarly, e 2 = dr/du 2 and e 3 = dr/du 3 are vectors tangent to 
the m 2 - and u 3 - curves at P in the direction of increasing u 2 and u 3 respectively. 
Denoting the lengths of these vectors by hi, h 2 and h 3 , the unit vectors in each of 
these directions are given by 

A _ 1 8r 

1 hi 8u\ ’ 

A = J_^ 

2 h 2 d u 2 ’ 

A = ±dr_ 

63 h 3 du 3 ’ 

where hi = \dr/du\\, h 2 = \dr/du 2 \ and h 3 = \dr/du 3 \. 

The quantities hi, h 2 , h 3 are called the scale factors of the curvilinear coordinate 
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Figure 10.11 General curvilinear coordinates. 


system. The element of distance associated with an infinitesimal change du, in 
one of the coordinates is h, du,. In the previous section we found that the scale 
factors for cylindrical and spherical polar coordinates were 


for cylindrical polars h p = 1, h ( j, = p , h z = 1, 

for spherical polars h r = 1, hg = r, h c j > = rsm6. 


Although the vectors ei, e2, e-, form a perfectly good basis for the curvilinear 
coordinate system, it is usual to work with the corresponding unit vectors ei, 62, 
63. For an orthogonal curvilinear coordinate system these unit vectors form an 
orthonormal basis. 

An infinitesimal vector displacement in general curvilinear coordinates is given 
by, from (10.19), 


r dr 8r 8r 

dr = “ — du\ + - — d «2 + — du? 

OU\ OU2 0U3 

= du\ ei + dii 2 e2 + du?, e3 
— h\ du\ T h? du 2 €2 T h 3 du 3 63. 


(10.55) 

(10.56) 

(10.57) 


In the case of orthogonal curvilinear coordinates, where the e ; are mutually 
perpendicular, the element of arc length is given by 

(ds) 2 = dr ■ dr = h\(du\) 2 + h\(du 2 ) 2 + h 2 (dui) 2 . (10.58) 

The volume element for the coordinate system is the volume of the infinitesimal 
parallelepiped defined by the vectors (dr/du^du, = du, e, = /i, du, e,, for i — 1,2,3. 
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For orthogonal coordinates this is given by 

dV = \di<i ej • (dii2t2 x dw 3 e 3 )| 

= \hi ej • (I12 e 3 x hi 63)! du\ dui dui 
= h\h2hi du\ dii2 du 3. 

Now, in addition to the set {e,}, i — 1,2,3, there exists another useful set of 
three unit basis vectors at P. Since Viq is a vector normal to the surface ii\ = ci, 
a unit vector in this direction is ei = Vki/|Vhi|. Similarly, e 2 = Vu 2 /\Vu 2 \ and 
£3 = V w 3 / 1 V 1/ 3 1 are unit vectors normal to the surfaces = C2 and 113 = C3 
respectively. 

Therefore at each point P in a curvilinear coordinate system, there exist, in 
general, two sets of unit vectors: {e,}, tangent to the coordinate curves, and {£,}, 
normal to the coordinate surfaces. A vector a can be written in terms of either 
set of unit vectors: 


a = fliej + ^2^2 + «3 e 3 — Ai£i + A2C2 + A 3 e 3 , 


where a\, 02, CI3 and A\, A2, A 3 are the components of a in the two systems. It 
may be shown that the two bases become identical if the coordinate system is 
orthogonal. 

Instead of the unit vectors discussed above, we could instead work directly with 
the two sets of vectors {e,- = dr/dut} and {e, = Vm,}, which are not, in general, of 
unit length. We can then write a vector a as 

a = aiei + a 2 e 2 + a 3 e 3 = /hei + p 2 e 2 + 
or more explicitly as 

- r 

a = a 3 — 1- 0.2 -z b 03 - — = /IiVmi + /I2VM2 + P 3Vu 3 , 

cu\ 0U2 du?, 

where ai, a 3 , a 3 and pi, P2, Pi are called the contravariant and covariant com- 
ponents of a respectively. A more detailed discussion of these components, in 
the context of tensor analysis, is given in chapter 21. The (in general) non-unit 
bases {e,} and {e,} are often the most natural bases in which to express vector 
quantities. 


► S/iow that {e,} and {e,} are reciprocal systems of vectors. 


Let us consider the scalar product e, • e,-; using the Cartesian expressions for r and V, we 
obtain 


e, 


' € j = 



( dx . dy . 8z \ 

\< 3 «, duj duj J 


Out . du, . du, , 
w^iT -^r 2 j+ it- k 
dx dy 3 8z 


8x duj dy duj 8z duj duj 
du, dx duj dy dut dz du t ’ 
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in the last step we have used the chain rule for partial differentiation. Therefore e,- ■ ej = 1 
if i = j, and e, ■ ej = 0 otherwise. Elence {e,} and {e 7 } are reciprocal systems of vectors. ◄ 

We now derive expressions for the standard vector operators in orthogonal 
curvilinear coordinates. Despite the useful properties of the non-unit bases dis- 
cussed above, the remainder of our discussion in this section will be in terms of 
the unit basis vectors {e,}. The expressions for the vector operators in cylindrical 
and spherical polar coordinates given in tables 10.2 and 10.3 respectively can be 
found from those derived below by inserting the appropriate scale factors. 


Gradient 

The change dd> in a scalar field cD resulting from changes du\,du 2 ,du 3 in the 
coordinates u 3 ,u 2 ,u 3 is given by, from (5.5), 


dd> dd> d<b , 

dQ = ~ — du i + — — dii2 + - — du 3 . 

CU\ CU2 UU3 

For orthogonal curvilinear coordinates mi,m 2 ,m 3 we find from (10.57), and com- 
parison with ( 10.27), that we can write this as 


dO = VO • dr. 


(10.59) 


where VO is given by 

1 dO A 1 dO A 1 dO , 

VO = ei -| e-> H e 3 . 

h din 1 h 2 du 2 - h 3 du 3 3 

This implies that the del operator can be written 

e 3 8 e 2 8 e 3 8 


V = 


h 1 8u\ h 2 8U2 h 3 8 u 3 


(10.60) 


► S/iow that for orthogonal curvilinear coordinates Vut = bi/hi. Hence show that the two 
sets of vectors {e,} and {e,} are identical in this case. 


Letting O = u, in (10.60) we find immediately that V», = Therefore |Vu,-| = 1 /h h and 
so e,- = Vui/\Vui\ = hfhii = e,-. ◄ 


Divergence 

In order to derive the expression for the divergence of a vector field in orthogonal 
curvilinear coordinates, we must first write the vector field in terms of the basis 
vectors of the coordinate system: 

a = ai ei + «2 £2 + a 3 e 3 . 


The divergence is then given by 


Va 


1 

hih 2 h 3 


— (/) 2 fi 3 ai) + ^—(h 3 hia 2 ) + -^(hih 2 a 3 ) 

OU\ UU2 uUt, 


(10.61) 
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►Prore the expression for V ■ a in orthogonal curvilinear coordinates. 

Let us consider the sub-expression V ■ (rqei). Now ^ = e 2 x e 3 = h 2 Vu 2 x /( 3 Vu 3 . Therefore 
V • (a^) = V • (aih 2 h 2 Vu 2 x V» 3 ), 

= V( ai h 2 h 3 ) • (V» 2 x Vu 3 ) + a 1 /i 2 h 3 V ■ (Vu 2 x Vu 3 ). 

However, V ■ (Vu 2 x V» 3 ) = 0, from (10.43), so we obtain 

V • (aiei) = V(a\h 2 h 2 ) ■ ( ^ x = V(a 1 /i 2 /i 3 ) ■ ; 

V "2 «3 / "2"3 

letting ® = ai/i 2 /j 3 in (10.60) and substituting into the above equation, we find 

V ■ (aiei) = , } , w—(a\h 2 hf). 

Repeating the analysis for V ■ (a 2 e 2 ) and V ■ (a 3 e 3 ), and adding the results we obtain (10.61), 
as required. ◄ 


Laplacian 

In the expression for the divergence (10.61), let 

1 d® A 1 d<I> A 
a = VO = — - — e i + — - — e 2 
h\ ou\ h 2 0U2 

where we have used (10.60). We then obtain 


/13 dw 3 3 ’ 


v-o = 


1 


/? 3 /t2/t3 


_8_ 

du\ 


/i2 /13 dO 
hi 8u\ 


d 

dii2 


hjhi dO 

I12 dU2 


_d_ 

8u 3 


/r 1 /?2 dO 
I13 d«3 


which is the expression for the Laplacian in orthogonal curvilinear coordinates. 


Curl 

The curl of a vector field a = Mi + M 2 + 0363 in orthogonal curvilinear 
coordinates is given by 


Mi 

M2 

M 3 

d 

8 

d 

dtq 

8u 2 

dw 3 

hia 1 

h 2 a 2 

(73 ai 


(10.62) 


►Prore the expression for V x a in orthogonal curvilinear coordinates. 

Let us consider the sub-expression V x (Mi). Since e 3 = /?iViq we have 
V x (Mi) = V x (MiVui), 

= V(ai/ii) x V»i + ai/iiV x Vmi. 

But V x Viii = 0, so we obtain 

c 

V x (aiei) = V(Mi) x — . 

hi 
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VO 


V a 


V x a 


V 2 0 


1 50. 1 50. 1 50. 

/?i 5 mj Cl I12 dii2 ~ hi du} 3 


1 

-^—(h 2 hiai) + -^—(hihia 2 ) + -^—(hlnai) 
ou\ 0 U 2 uui, 

hih 2 h } 

1 

hie 1 h 2 e 2 h 3 e 3 

5 5 5 


hih 2 hi 

5«i 5«2 8u 2 
hici\ /i 2«2 h 2 a 2 



1 

' 5 

( h 2 h } 

e 

- 1 - 1 

f hih\ 

50 \ 

h\h 2 hi 

5«i 

V h 

5ui j 

' du 2 

\ h 2 

du 2 ) 


/ /ii /?2 50 \ 

\ hi diti ) 


Table 10.4 Vector operators in orthogonal curvilinear coordinates u\ , 112 , U 3 . 
O is a scalar field and a is a vector field. 


Letting O 


ai/?i in (10.60) and substituting into the above equation, we find 


V x (cqei) 


h 5 

hihi 8ui 


(fli/ii) - 


63 5 

/ll/?2 8 U 2 


(aih). 


The corresponding analysis of V x (a 2 e 2 ) produces terms in 63 and ei, whilst that of 
V x (cr 3 e 3 ) produces terms in and e 2 . When the three results are added together, the 
coefficients multiplying ei, £2 and 63 are the same as those obtained by writing out (10.62) 
explicitly, thus proving the stated result. ◄ 


The general expressions for the vector operators in orthogonal curvilinear 
coordinates are shown for reference in table 10.4. The explicit results for cylindrical 
and spherical polar coordinates, given in tables 10.2 and 10.3 respectively, are 
obtained by substituting the appropriate set of scale factors in each case. 

A discussion of the expressions for vector operators in tensor form, which 
are valid even for non-orthogonal curvilinear coordinate systems, is given in 
chapter 21. 


10.11 Exercises 


10.1 Evaluate the integral 

I [a(b ■ a + b ■ a) + a(b ■ a) — 2(a ■ a)b — b|a| 2 ] dt 

in which a, b are the derivatives of a, b with respect to f. 

10.2 At time t = 0, the vectors E and B are given by E = E 0 and B = B 0 , where the 
fixed unit vectors E 0 and B 0 are orthogonal. The equations of motion are 

dE „ „ ^ 

, = Eo + B x Eo, 
dt 

dB „ ^ „ 

— 7 - — Bo + E x Bo. 
dt 

Find E and B at a general time f, showing that after a long time the directions 
of E and B have almost interchanged. 
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10.3 


10.4 

10.5 


The general equation of motion of a (non-relativistic) particle of mass m and 
charge q when it is placed in a region where there is a magnetic field B and an 
electric field E is 


mi = q( E + rxB|; 


here r is the position of the particle at time t and r = dr/dt etc. Write this as 
three separate equations in terms of the Cartesian components of the vectors 
involved. 

Lor the simple case of crossed uniform fields E = £i, B = Bj in which the 
particle starts from the origin at t = 0 with r = uok, find the equations of motion 
and show the following: 

(a) if vo = E/B then the particle continues its initial motion; 

(b) if t>o = 0 then the particle follows the space curve given in terms of the 
parameter C by 

mE a mE ■ t, 

.x=— (1-cosO, y = 0, z = ^(£- sin{). 

Interpret this curve geometrically and relate ^ to f. Show that the total 
distance travelled by the particle after time t is 


2 £ 
~B 



dt' . 


Use vector methods to find the maximum angle to the horizontal at which a stone 
may be thrown so as to ensure that it is always moving away from the thrower. 
If two systems of coordinates with a common origin O are rotating with respect 
to each other, the measured accelerations differ in the two systems. Denoting 
by r and r' position vectors in frames OXYZ and OX'Y'Z' respectively, the 
connection between the two is 


r' = if + m X r + 2m x r + co x (co x r), 

where co is the angular velocity vector of the rotation of OXYZ with respect to 
OX’Y'Z' (taken as fixed). The third term on the RHS is known as the Coriolis 
acceleration, whilst the final term gives rise to a centrifugal force. 

Consider the application of this result to the firing of a shell of mass m from 
a stationary ship on the steadily rotating earth, working to the first order in 
co (= 7.3 x 1(D 5 rad s -1 )- If the shell is fired with velocity v at time t = 0 and only 
reaches a height that is small compared to the radius of the earth, show that its 
acceleration, as recorded on the ship, is given approximately by 

r = g - 2co x (v + gf), 

where mg is the weight of the shell measured on the ship's deck. 

The shell is fired at another stationary ship (a distance s away) and v is such 
that the shell would have hit its target had there been no Coriolis effect. 


(a) Show that without the Coriolis effect the time of flight of the shell would 
have been x = — 2g • v/g 2 . 

(b) Show further that when the shell actually hits the sea it is off target by 
approximately 


2t 

g 2 


[(g x co) • v](gr + v) — (co x v)t 2 — 


1 

3 


(co x g)r\ 


(c) Estimate the order of magnitude A of this miss for a shell for which v = 300 
m s-\ firing close to its maximum range (v makes an angle of n/4 with the 
vertical) in a northerly direction, whilst the ship is stationed at latitude 45° 
North. 
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10.6 Prove that for a space curve r = r(s), where s is the arc length measured along 
the curve from a fixed point, the triple scalar product 

( dr d 2 r\ d } r 

\ ds ds 2 J ds 3 

at any point on the curve has the value k 2 t, where k is the curvature and r the 
torsion at that point. 

10.7 For the twisted space curve y 3 + 21axz — 81a 2 y = 0, given parametrically by 

x = au(2 — u 2 ), y = 3au 2 , z = au(3 + u 2 ), 
show the following: 

(a) that ds/du = 3j2a(l + u 2 ), where s is the distance along the curve measured 
from the origin; 

(b) that the length of the curve from the origin to the Cartesian point (2a, 3a, 4a) 
is 4^/2 a; 

(c) that the radius of curvature at the point with parameter u is 3a(l + u 2 ) 2 ; 

(d) that the torsion r and curvature ic at a general point are equal; 

(e) that any of the Frenet-Serret formulae that you have not already used 
directly are satisfied. 

10.8 The shape of the curving slip road joining two motorways that cross at right 
angles and are at vertical heights z = 0 and z = h can be approximated by the 
space curve 

r = ln cos (1) 1 + ln sm (5) j + zk 

Show that the radius of curvature p of the curve is (2h/n)cosec (zn/h) at height 
z and that the torsion r = — 1 / p. (To shorten the algebra, set z = 2hd/n and use 
9 as the parameter.) 

10.9 In a magnetic field, field lines are curves to which the magnetic induction B is 
everywhere tangential. By evaluating dB/ds, where s is the distance measured 
along a field line, prove that the radius of curvature at any point on a line is 
given by 

B 3 

P ~ |Bx (B-V)BI' 

10.10 (a) Using the parameterization x = ucoscf), y = usmifi, z = u cotQ, find the 

sloping surface area of a right circular cone of semi-angle Q whose base has 
radius a. Verify that it is equal to * x perimeter of the base x slope height, 
(b) Using the same parameterization as in (a) for x and y, and an appropriate 
choice for z, find the surface area between the planes z = 0 and z = Z of 
the paraboloid of revolution z = a(.x 2 + y 2 ). 

10.11 (a) Parameterising the hyperboloid 



by x = a cos 9 sec <j>, y = b sin 9 sec <f>, z = c tan (f>, show that an area element 
on its surface is 

dS = sec 2 <p [c 2 sec 2 (j> (b 2 cos 2 9 + a 2 sin 2 0) + a 2 b 2 tan 2 cf\ l/ ~ dd dtj>. 

(b) Use this formula to show that the area of the curved surface x 2 +y 2 —z 2 = a 2 
between the planes z = 0 and z = 2a is 
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10.12 


10.13 

10.14 


10.15 


10.16 

10.17 


10.18 


For the function 

z(x,y) = (x 2 -y 2 )e~ xl ~ y2 , 

find the location! s) at which the steepest gradient occurs. What are the magnitude 
and direction of that gradient? (The algebra involved is easier if plane polar 
coordinates are used.) 

Verify by direct calculation that 

V ■ (a x b) = b ■ (V x a) — a • (V x b). 


(a) Simplify 

V x a(V - a) + a x [V x (V x a)] + ax V 2 a. 

(b) By explicitly writing out the terms in Cartesian coordinates prove that 

[c • (b ■ V) - b ■ (c ■ V)] a = (V x a) ■ (b x c). 

(c) Prove that a x (V x a) = V^a 2 ) — (a ■ V)a. 

Evaluate the Laplacian of the function 

zx 2 

xp(x,y,z) = 

x 1 + y 2 + z 2 

(a) directly in Cartesian coordinates, and (b) after changing to a spherical polar 
coordinate system. Verify that, as they must, the two methods give the same 
result. 

Verify that ( 10.42) is valid for each component separately when a is the Cartesian 
vector x 2 y i + xyz j + z 2 y k, by showing that each side of the equation is equal to 
z i + (2x + 2z) j + x k. 

The (Maxwell) relationship between a time-independent magnetic field B and the 
current density J (measured in S.I. units in A m~ 2 ) producing it. 


V x B = no J, 


can be applied to a long cylinder of conducting ionised gas which, in cylindrical 
polar coordinates, occupies the region p < a. 

(a) Show that a uniform current density (0, C,0) and a magnetic field (0, 0, B), 
with B constant (= B 0 ) for p > a and B = B(p) for p < a, are consistent 
with this equation. Obtain expressions for C and B(p) in terms of B 0 and a, 
given that B is continuous at p = a. 

(b) The magnetic field can be expressed as B = V x A, where A is known as the 
vector potential. Show that a suitable A can be found which has only one 
non-vanishing component, A^,(p), and obtain explicit expressions for A^p) 
for both p < a and p > a. Like B, the vector potential is continuous at p = a. 

(c) The gas pressure p(p) satisfies the hydrostatic equation Vp = J x B and 
vanishes at the outer wall of the cylinder. Find a general expression for p. 

(a) For cylindrical polar coordinates p,4>,z evaluate the derivatives of the three 
unit vectors with respect to each of the coordinates, showing that only de p /dt f> 
and de^/dcj) are non-zero. 

(i) Hence evaluate V 2 a when a is the vector e p , i.e. a vector of unit magnitude 
everywhere directed radially outwards from the z-axis. 

(ii) Note that it is trivially obvious that V x a = 0 and hence that equation 
(10.41) requires that V(V ■ a) = V 2 a. 

(iii) Evaluate V(V ■ a) and show that the latter equation holds, but that 

[V(V-a|]^V 2 V 
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10.19 


10.20 


10.21 


(b) Rework the same problem in Cartesian coordinates (where, as it happens, 
the algebra is more complicated). 

Maxwell's equations for electromagnetism in free space (i.e. in the absence of 
charges, currents and dielectric or magnetic media) can be written 


(i) V ■ B = 0, (ii) V • E = 0, 

rlB 1 2E 

(hi) VxE+ — =0, (iv) VxB- 1T =0. 
dt c L dt 


A vector A is defined by B 
that if the condition 


= V x A, and a scalar <fi by E = — V<p — 8 A /dt. Show 


1 86 

(v) V.A +? ^=° 


is imposed (this is known as choosing the Lorenz gauge), then both A and tf> 
satisfy the wave equations 


(vi) V 2 cj) — 

(vii) V 2 A- 


= 0 , 


1 d 2 <j) 
c 2 dt 2 

1— =0 

c 2 dt 2 


The reader is invited to proceed as follows. 


(a) Verify that the expressions for B and E in terms of A and f are consistent 
with (i) and (iii). 

(b) Substitute for E in (ii) and use the derivative with respect to time of (v) to 
eliminate A from the resulting expression. Hence obtain (vi). 

(c) Substitute for B and E in (iv) in terms of A and <j>. Then use the divergence 
of (v) to simplify the resulting equation and so obtain (vii). 

For a description in spherical polar coordinates with axial symmetry of the flow 
of a very viscous fluid, the components of the velocity field u are given in terms 
of the stream function i/> by 

1 dtp —1 dip 

r 2 sin 9 89 ’ U ° r sin 6 8r ' 

Find an explicit expression for the differential operator E defined by 


Eip = — (rsin0)(V x u)^,. 

The stream function satisfies the equation of motion E 2 ip =0 and, for the flow of 
a fluid past a sphere, takes the form t p(r,9) = /(r)sin 2 6. Show that f(r) satisfies 
the (ordinary) differential equation 

r 4 / (4) - 4r 2 /" + 8 rf - 8/ = 0. 


Paraboloidal coordinates u, v, (j> are defined in terms of Cartesian coordinates by 

x = uv cos </>, y = uv sin <j>, z = ^(u 2 — v 2 ). 

Identify the coordinate surfaces in the u, v, system. Verify that each coordinate 
surface (u = constant, say) intersects every coordinate surface on which one of 
the other two coordinates (v, say) is constant. Show further that the system of 
coordinates is an orthogonal one and determine its scale factors. Prove that the 
u-component of V x a is given by 

1 / a$_ 8af\ 1 8a v 

(u 2 + r 2 ) 1 / 2 \ i) 8v ) uv 8<j) 
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10.22 Non-orthogonal curvilinear coordinates are difficult to work with and should be 
avoided if at all possible, but the following example is provided to illustrate the 
content of section 10.10. 

In a new coordinate system for the region of space in which the Cartesian 
coordinate z satisfies z > 0, the position of a point r is given by (ai,a 2 ,R), where 
ai and a 2 are respectively the cosines of the angles made by r with the x- and y- 
coordinate axes of a Cartesian system and R = |r|. The ranges are —1 < a, < 1, 

0 < R < oo. 


10.23 


10.24 


(a) Express r in terms of ai, a. 2 , R and the unit Cartesian vectors i, j, k. 

(b) Obtain expressions for the vectors e, (= dr/dai,. . . ) and hence show that the 
scale factors h t are given by 


R (1 — a | )‘^ 2 
(1 — a 2 — a?) 1 / 2 ’ 


R(l-a?) 1/2 
2 (1 — a 2 — ' 


h 2 = 1. 


(c) Verify formally that the system is not an orthogonal one. 

(d) Show that the volume element of the coordinate system is 

R 2 doct da 2 dR 
dv = - - 

(1 -a 2 — a?) 1 / 2 ’ 

and demonstrate that this is always less than or equal to the corresponding 
expression for an orthogonal curvilinear system. 

(e) Calculate the expression for (ds) 2 for the system, and show that it differs 
from that for the corresponding orthogonal system by 


2a.\U2 R 2 

~ y dccidct, 2- 

1 — a f — (*2 


Hyperbolic coordinates u,v,(j> are defined in terms of Cartesian coordinates by 


x = cosh u cos cos <j>, y = cosh u cos v sin </>, z = sinh u sin v. 

Sketch the coordinate curves in the <f> = 0 plane, showing that far from the origin 
they become concentric circles and radial lines. In particular, identify the curves 
u = 0, v = 0, v = 7t/2 and v = n. Calculate the tangent vectors at a general 
point, show that they are mutually orthogonal and deduce that the appropriate 
scale factors are 


/i„ = h L , = (cosh 2 u — cos 2 1 >) 1/2 , = cosh u cos v. 

Lind the most general function ip(u) of u only that satisfies Laplace’s equation 
V'Y 0. 

In a Cartesian system, A and B are the points (0,0,— 1) and (0,0, 1) respectively. 
In a new coordinate system a general point P is given by («i,i< 2 ,M 3 ) with 
»i = j(n + r 2 ), U 2 = § (r i — r 2 ), u 2 = cf); here ri and r 2 are the distances AP and 
BP and is the angle between the plane ABP and y = 0. 

(a) Express z and the perpendicular distance p from P to the z-axis in terms of 

U\, U2 , U 3 . 

(b) Evaluate 3 x/5m„ dy/dui , dz/du u for i = 1,2,3. 

(c) Find the Cartesian components of u ; and hence show that the new coordi- 
nates are mutually orthogonal. Evaluate the scale factors and the infinitesimal 
volume element in the new coordinate system. 

(d) Determine and sketch the forms of the surfaces u, = constant. 

(e) Find the most general function / of u\ only that satisfies V 2 / = 0. 
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10.12 Hints and answers 

10.1 a x (a x b) + h. 

10.2 Taking E 0 = i and B 0 = j, E = (1 + f)i + (t 2 /2 + f 3 / 6)j — (t + t 1 / 2)k, 

B = (f 2 /2 + f 3 /6)i + (1 + t)j + (f + f 2 /2)k. 

10.3 For crossed uniform fields x + {Bq/m) 2 x = q(E — Bv 0 )/m, y = 0, mi = qBx + mv 0 ; 

(b) t; = Bqt/m ; the path is a cycloid in the plane y = 0; ds = [(dx/dt) 2 + 
(dz/dt) 2 ] 1 ^ 2 dt. 

10.4 Prove that the vector equation of the stone is r = v 0 f +gf 2 /2. Impose the condition 
r • r > 0 for all t, i.e. r ■ r = 0 has no real roots for t ; Sugg 2 > 9(vo ■ g) 2 . Maximum 
angle is 70.5°. 

10.5 g = r' — o) x (o> x r), where r' is the shell's acceleration measured by an observer 
fixed in space. To first order in u>, the direction of g is radial, i.e. parallel to r'. 

(a) Note that s is orthogonal to g. 

(b) If the actual time of flight is T, use (s + A) ■ g = 0 to show that 

T »T(l+2g~ 2 (gxeo)-vH ). 

In the Coriolis terms it is sufficient to put T « t. 

(c) For this situation (g x co) • v = 0 and coxv = 0;r« 43 s and A = 10-15 m 
to the East. 

10.6 Differentiate b = t x n with respect to s; express the result in terms of the 
derivatives of r; take the scalar product with d 2 t/ds 2 . 

10.7 (a) Evaluate (dr/du) • (dr/du). 

(b) Integrate the previous result between u = 0 and u = 1. 

(c) t = [^/2(1 + n 2 )] -1 [(1 — u 2 )i + 2»j + (1 + « 2 )k]. Use dt/ds = (dt/du)/(ds/du); 
p -1 = \dt/ds\. 

(d) n = (1 +m 2 ) -1 [— 2ui + (l — u 2 )j]. b = [^/2(1 + h 2 )] 1 [( w 2 — 1 )i — 2«j + (1 + u 2 )k]. 

Use ^ ^ / ^- and show that this equals — [3a(l + u 2 ) 2 ] _1 n. 

ds du /du 

(e) Show that dn/ds = r(b — t) = — 2[3^2a(l + u 2 ) 3 ] _1 [(l — u 2 )i + 2uj]. 

10.8 ds/dd = J2h/(n sin 6 cos 9); t = — sin 2 9 i + cos 2 9 j +y/2 sin 9 cos ilk; 
b = cos 2 9 i — sin 2 9 j +y/2 sin 9 cos 9 k. 

10.9 Note that dB = (dr ■ V)B and that B = Bt, with t = dt/ds. Obtain (B ■ V)B /B = 
t (dB/ds) + n (B/p) and then take the vector product of t with this equation. 

10.10 (a) dS = \(—ucos4> cot fi, —u sin cf > cot £2, u)\ dtfidir, S = no 1 cosec Q. 

(b) z = an 2 ; dS = u( 1 + 4a 2 n 2 ) 1 ^ 2 d(f>du; S = (7t/6)[(l + 4a Z) 3 ? 2 — 1]. 

10.11 (b) Put tan </> = 2 _1/2 sinh ip. 

10.12 |Vz| 2 = 4p 2 e _2p2 [(l — p 2 ) 2 cos 2 2 <j> + p 2 sin 2 2(f)], which is extremal when <f> = nn/A 
and 1 — p 2 = 0. Maximum slope = 2e _1 at x = +1 ,y = +1, along azimuthal 
directions x + y = +2 and x + y = +2. 

10.14 (a) (V ■ a)(V x a); (b) terms of the form b x c x (da x /8x) cancel; (c) for the x- 
component, add and subtract a x (8a x /8x) and regroup. 

10.15 (a) 2z(x 2 +r+z 2 )- 3 [(y 2 +z 2 )(y 2 +z 2 — 3x 2 )— 4x 4 ]. (b) 2r“‘ cos 0(1-5 sin 2 9 cos 2 <f>); 
both are equal to 2 zr _4 (r 2 — 5x 2 ). 

10.17 Use the formulae given in table 10.2. 

(a) C = —B 0 /((ioa);B(p) = B 0 p/a. 

(b) B 0 p 2 /(3a) for p < a, and B 0 [p/2 — a 2 /(6p)] for p > a. 

(c) [Bo/(2/<o)] [1 — (p/a) 2 ]- 

10.18 (a) de p /8(f) = e^, d^/dcf) = -e p ; (i) -p~ 2 e p . (b) V 2 a = -(x 2 + y 2 )~ 3/2 (xi + yj). 
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10.20 

10.21 

10.22 

10.23 


10.24 


d 2 sinO 8 ( 1 8 \ 

dr 2 r 2 86 \ sin 6 86 ) ' 

Two sets of paraboloids of revolution about the z-axis and the sheaf of planes 
containing the z-axis. Lor constant u, — oo < z < u 2 / 2; for constant v,—v 2 /2 < 
z < oo. The scale factors are h u = h v = ( u 2 + u 2 ) 1/2 , h $ = uv. 

(c) ei • = R 2 aioc 2 /(l — oil — ^ 0. 

The tangent vectors are as follows: for u = 0, the line joining (1,0,0) and 
(—1,0,0); for v = 0, the line joining (1,0,0) and (oo,0,0). Lor v = n/2, the line 
(0,0, z); for v = n, the line joining (—1,0,0) and (— oo,0,0). i p(u) = 2tan _1 e u +c, 
derived from 5 [cosh u(8xp / 8u)] / 8u = 0. 

(a) z = U\U 2 , p = u\ + u 2 — u 2 u 2 — 1. 

(b) ui( 1 — ul) cos i<3 / p, t<i ( 1 — u|) sin uj/p, U 2 \ i/2(l — u 2 ) cos us/ p, ui(l — M 2 ) sin ui/p. 
Mi ; — p sin U}, p cos M3, 0. 

(c) [( u 2 — u\)/(u\ — l)] 1 ^ 2 , [(u\ — u\)/(ul — l)] 1 ^ 2 , p', |m 2 — m 2 | duidihdu}. 

(d) Confocal ellipsoids, hyperboloids, half-planes containing the z-axis. 

(e) Bln[(ui — 1)/(mi + 1)]. 
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Line, surface and volume integrals 


In the previous chapter we encountered continuously varying scalar and vector 
fields and discussed the action of various differential operators on them. In 
addition to these differential operations, the need often arises to consider the 
integration of field quantities along lines, over surfaces and throughout volumes. 
In general the integrand may be scalar or vector in nature, but the evaluation 
of such integrals involves their reduction to one or more scalar integrals, which 
are then evaluated. In the case of surface and volume integrals this requires the 
evaluation of double and triple integrals (see chapter 6). 

11.1 Line integrals 

In this section we discuss line or path integrals, in which some quantity related 
to the field is integrated between two given points in space, A and B , along a 
prescribed curve C that joins them. In general, we may encounter line integrals 
of the forms 

4>dr, a ■ dr, / a x dr, (11.1) 

Jc Jc Jc 

where <j> is a scalar field and a is a vector field. The three integrals themselves are 
respectively vector, scalar and vector in nature. As we will see below, in physical 
applications line integrals of the second type are by far the most common. 

The formal definition of a line integral closely follows that of ordinary integrals 
and can be considered as the limit of a sum. We may divide the path C joining 
the points A and B into N small line elements Ar ; „ p = 1 If (x p ,y p ,z p ) is 

any point on the line element Ar p then the second type of line integral in (11.1), 
for example, is defined as 

N 

/ a ■ dr = lim V a (x p ,y p ,z p ) ■ Ar p , 

I r iV— k» z ' 

JL p= 1 

where it is assumed that all |Ar p | — > 0 as N — > oo. 
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Each of the line integrals in (11.1) is evaluated over some curve C that may be 
either open (A and B being distinct points) or closed (the curve C forms a loop, 
so that A and B are coincident). In the case where C is closed, the line integral 
is written <f c to indicate this. The curve may be given either parametrically by 
r(u) = x(u)i + y(u)j + z(u)k or by means of simultaneous equations relating x, y, z 
for the given path (in Cartesian coordinates). A full discussion of the different 
representations of space curves was given in section 10.3. 

In general, the value of the line integral depends not only on the end-points 
A and B but also on the path C joining them. For a closed curve we must also 
specify the direction around the loop in which the integral is taken. It is usually 
taken to be such that a person walking around the loop C in this direction 
always has the region R on his/her left; this is equivalent to traversing C in the 
anticlockwise direction (as viewed from above). 


11.1.1 Evaluating line integrals 


The method of evaluating a line integral is to reduce it to a set of scalar integrals. 
It is usual to work in Cartesian coordinates, in which case dr = dx i + dy j + dz k. 
The first type of line integral in (11.1) then becomes simply 


[ (f>dr = i [ 4>{x,y ,z)dx + j / 4>{x,y,z) dy + k f <f>(x,y,z) dz. 


JC Jc Jc Jc 

The three integrals on the RHS are ordinary scalar integrals that can be evaluated 
in the usual way once the path of integration C has been specified. Note that in 
the above we have used relations of the form 


/ 


cf)idx 



which is allowable since the Cartesian unit vectors are of constant magnitude 
and direction and hence may be taken out of the integral. If we had been using 
a different coordinate system, such as spherical polars, then, as we saw in the 
last chapter, the unit basis vectors would not be constant. In that case the basis 
vectors could not be factorised out of the integral. 

The second and third line integrals in (11.1) can also be reduced to a set of 
scalar integrals by writing the vector field a in terms of its Cartesian components 
as a = a x i + a y j + a : k, where a x , a y , a z are each (in general) functions of x, y, z. 
The second line integral in (11.1), for example, can then be written as 


/ a • dr = / (a x i + a y j + a z k) • (dx i + dy j + dz k) 
Jc Jc 

= f (a x dx + a y dy + a : dz) 


= / a x dx + / a y dy + / a z dz. 

i/ C u C u c 


( 11 . 2 ) 
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A similar procedure may be followed for the third type of line integral in (11.1). 

Line integrals have properties that are analogous to those of ordinary integrals. 
In particular, the following are useful properties (which we illustrate using the 
second form of line integral in (11.1) but which are valid for all three types). 


(i) Reversing the path of integration changes the sign of the integral. If the 
path C along which the line integrals are evaluated has A and B as its 
end-points then 

pB pA 

/ a • dr = — / a • dr. 

J A JB 

This implies that if the path C is a loop then integrating around the loop 
in the opposite direction changes the sign of the integral. 

(ii) If the path of integration is subdivided into smaller segments then the sum 
of the separate line integrals along each segment is equal to the line integral 
along the whole path. So, if P is any point on the path of integration that 
lies between the path’s end-points A and B then 



a ■ dr ■ 


a • dr. 


► Evaluate the line integral I = f c a ■ dr, where a = (x + y)i + (y — x)j, along each of the 
paths in the xy-plane shown in figure 11.1, namely 

(i) the parabola y 2 = x from (1,1) to (4,2), 

( ii ) the curve x = 2 u 2 + u + 1, y = 1 + u 2 from (1,1) to (4, 2), 

(Hi) the line y = 1 from (1,1) to (4,1), followed by the line x = 4 from (4,1) 
to (4,2). 


Since each of the paths lies entirely in the xy-plane, we have dr = dxi + dyj. We can 
therefore write the line integral as 

1= / a ■ dr = / [(x + y)dx T (y — x)dy]. (11.3) 

Jc Jc 

We must now evaluate this line integral along each of the prescribed paths. 

Case (i). Along the parabola y 2 = x we have 2ydy = dx. Substituting for x in (11.3) 
and using just the limits on y, we obtain 

ri 4 , 2 ) r-2 

1= t(x + y) dx + (y — x) dy] = / [(y 2 + y)2y + (y - y 2 )] dy = 11 1. 

. 7 ( 1 , 1 ) J 1 

Note that we could just as easily have substituted for y and obtained an integral in x, 
which would have given the same result. 

Case (ii). The second path is given in terms of a parameter u. We could eliminate u 
between the two equations to obtain a relationship between x and y directly and proceed 
as above, but it is usually quicker to write the line integral in terms of the parameter u. 
Along the curve x = 2 u 2 + u + 1, y = 1 + u 2 we have dx = (4m + 1 )du and dy = 2udu. 
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y 



Figure 11.1 Different possible paths between the points (1, 1) and (4,2). 


Substituting for x and y in (11.3) and writing the correct limits on u, we obtain 

r<4,2) 


m/j 

1=1 [(x + y) dx + (y — x) cly] 

•Aui 

ci 


= / [(3tr + u + 2)(4w + 1) — (tr + u)2u] du = 101. 


Case (in). For the third path the line integral must be evaluated along the two line 
segments separately and the results added together. First, along the line y = 1 we have 
dy = 0. Substituting this into (11.3) and using just the limits on x for this segment, we 
obtain 


.(4,1) .4 

/ [(x + }.') dx + (y — x) dy] = / (x + 1 )dx = 10j. 

/ (i,i) J l 


Next, along the line x = 4 we have dx = 0. Substituting this into (11.3) and using just the 
limits on y for this segment, we obtain 


A 4,2) 


' (4,1) 


[(x + y)dx + (y — x)dy] = / (y — 4)dy = — 2\. 


The value of the line integral along the whole path is just the sum of the values of the line 
integrals along each segment, and is given by / = 10^ — 2\ = 8. ◄ 

When calculating a line integral along some curve C, which is given in terms 
of x, y and z, we are sometimes faced with the problem that the curve C is such 
that x, y and z are not single-valued functions of one another over the entire 
length of the curve. This is a particular problem for closed loops in the xy-plane 
(and also for some open curves). In such cases the path may be subdivided into 
shorter line segments along which one coordinate is a single-valued function of 
the other two. The sum of the line integrals along these segments is then equal 
to the line integral along the entire curve C. A better solution, however, is to 
represent the curve in a parametric form r («) that is valid for its entire length. 
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► Evaluate the line integral I = <f> c xdy, where C is the circle in the xy-plane defined by 
x 2 +y 2 = a 2 , z = 0. 


Adopting the usual convention mentioned above, the circle C is to be traversed in the 
anticlockwise direction. Taking the circle as a whole means x is not a single-valued 
function of y. We must therefore divide the path into two parts with .x = +a/u 2 — y 2 for 
the semicircle lying to the right of x = 0, and x = — \J a 1 — y 2 for the semicircle lying to 
the left of x = 0. The required line integral is then the sum of the integrals along the two 
semicircles. Substituting for x, it is given by 

1= j) xdy = j v/ a 2 — y 2 dy + J (^— a 2 — y 2 ^ dy 

= 4 / 'J a 2 — y 2 dy = no 2 . 

Jo 

Alternatively, we can represent the entire circle parametrically, in terms of the azimuthal 
angle <j>, so that x = a cos <j> and y = a sin tj) with 4> running from 0 to 2n. The integral can 
therefore be evaluated over the whole circle at once. Noting that dy = a cos d<f>, we can 
rewrite the line integral completely in terms of the parameter 4> and obtain 

I = (p x dy = a 2 / cos 2 <j>d(j> = na 2 . ◄ 

Jc Jo 


11.1.2 Physical examples of line integrals 

There are many physical examples of line integrals, but perhaps the most common 
is the expression for the total work done by a force F when it moves its point 
of application from a point A to a point B along a given curve C. We allow the 
magnitude and direction of F to vary along the curve. Let the force act at a point 
r and consider a small displacement dr along the curve; then the small amount 
of work done is dW = F • dr, as discussed in subsection 7.6.1 (note that dW can 
be either positive or negative). Therefore, the total work done in traversing the 
path C is 

Wc — J F ■ dr. 

Naturally, other physical quantities can be expressed in such a way. For example, 
the electrostatic potential energy gained by moving a charge q along a path C in 
an electric held E is — q f c E • dr. We may also note that Ampere’s law concerning 
the magnetic held B associated with a current-carrying wire can be written as 

j> B dr = 

where I is the current enclosed by a closed path C traversed in a right-handed 
sense with respect to the current direction. 

Magnetostatics also provides a physical example of the third type of line 
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integral in (11.1). If a loop of wire C carrying a current I is placed in a magnetic 
field B then the force dF on a small length dr of the wire is given by dF = I dr x B, 
and so the total (vector) force on the loop is 

F = I <j> dr x B. 


11.1.3 Line integrals with respect to a scalar 


In addition to those listed in (11.1), we can form other types of line integral, 
which depend on a particular curve C but for which we integrate with respect 
to a scalar du, rather than the vector differential dr. This distinction is somewhat 
arbitrary, however, since we can always rewrite line integrals containing the vector 
differential dr as a line integral with respect to some scalar parameter. If the path 
C along which the integral is taken is described parametrically by r(u) then 

dr = — du, 
du 


and the second type of line integral in (11.1), for example, can be written as 



A similar procedure can be followed for the other types of line integral in (11.1). 
Commonly occurring special cases of line integrals with respect to a scalar are 



ads. 


where s is the arc length along the curve C. We can always represent C paramet- 
rically by r(u), and from section 10.3 we have 


ds = 




The line integrals can therefore be expressed entirely in terms of the parameter u 
and thence evaluated. 


► Evaluate the line integral I = f c (x — y) 2 ds, where C is the semicircle of radius a running 
from A = (o, 0) to B = (— a, 0) and for which y > 0. 


The semicircular path from A to B can be described in terms of the azimuthal angle tf> 
(measured from the x-axis) by 

r (4>) = a cos fi + a sin 0 j, 

where <j> runs from 0 to n. Therefore the element of arc length is given, from section 10.3, 
by 


ds = t — ■ — dch = a( cos 2 d> + sin 2 ch)d(f> = a dd>. 
U dip dip 
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(a) 


(b) 


Figure 11.2 (a) A simply connected region; (b) a 

(c) a triply connected region. 



Since (x — y) 2 = a 2 ( 1 — sin 2 </>), the line integral becomes 

(x — y) 2 ds = / a 3 (l — sin2</>)d(j) = na 3 . ◄ 

Jo 



As discussed in the previous chapter, the expression (10.58) for the square of 
the element of arc length in three-dimensional orthogonal curvilinear coordinates 
U\,U2,U2 is 

(ds) 2 = h\ ( dui ) 2 + h\ (du 2 ) 2 + h 2 (dui) 2 , 


where h i, hi, (13 are the scale factors of the coordinate system. If a curve C in 
three dimensions is given parametrically by the equations = w,(/l) for i = 1, 2, 3 
then the element of arc length along the curve is 


ds = 




dl. 


11.2 Connectivity of regions 

In physical systems it is usual to define a scalar or vector field in some region R. 
In the next and some later sections we will need the concept of the connectivity 
of such a region in both two and three dimensions. 

We begin by discussing planar regions. A plane region R is said to be simply 
connected if every simple closed curve within R can be continuously shrunk to 
a point without leaving the region (see figure 11.2(a)). If, however, the region 
R contains a hole then there exist simple closed curves that cannot by shrunk 
to a point without leaving R (see figure 11.2(h)). Such a region is said to be 
doubly connected, since its boundary has two distinct parts. Similarly, a region 
with n — 1 holes is said to be n-fold connected, or multiply connected (the region 
in figure 11.2(c) is triply connected). 

These ideas can be extended to regions that are not planar, such as general 
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Figure 11.3 A simply connected region R bounded by the curve C. 


three-dimensional surfaces and volumes. The same criteria concerning the shrink- 
ing of closed curves to a point also apply when deciding the connectivity of such 
regions. In these cases, however, the curves must lie in the surface or volume 
in question. For example, the interior of a torus is not simply connected, since 
there exist closed curves in the interior that cannot be shrunk to a point without 
leaving the torus. On the other hand, the region between two concentric spheres 
of different radii is simply connected. 


11.3 Green’s theorem in a plane 

In subsection 11.1.1 we considered (amongst other things) the evaluation of line 
integrals for which the path C is closed and lies entirely in the xy-plane. Since 
the path is closed it will enclose a region R of the plane. We now discuss how to 
express the line integral around the loop as a double integral over the enclosed 
region R. 

Suppose the functions P(x,y), Q(x,y) and their partial derivatives are single- 
valued, finite and continuous inside and on the boundary C of some simply 
connected region R in the xy-plane. Green’s theorem in a plane then states 

j)(Pdx + Qdy) = JJ dxdy, (11.4) 

and so relates the line integral around C to a double integral over the enclosed 
region R. This theorem may be proved straightforwardly in the following way. 
Consider the simply connected region R in figure 11.3, and let y = yi(.x) and 
y = _y 2 (x) be the equations of the curves ST U and SVU respectively. We then 
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write 


8P f b rwW flp p 

—— dxdy = / dx dv nr~ = / dx 

Sy Ja Jy l(x ) 5y J a 

‘b ( 

P(x,y 2 (x))~ P(x,y!(x)) 


P(x,y) 


dx 


- y=»W 

- r=vi(x) 


/»/? pa p 

= — P(x,yi(x))dx — / P (x, y 2 (x)) dx = — (p P dx. 

Ja ' Jb JC 

If we now let x = xi(y) and x = x 2 (y) be the equations of the curves TSV and 
TUV respectively, we can similarly show that 


dQ f d 

— — dxdy = dv 

ox . L 


f x i(y) rd 

dx — = / dy\Q{x,y) 


My) 


dx 


- x=x2 {y) 

- .X=xi(v) 


rd r 


Q(x 2 (y),y)-Q(xi(y),y) dy 


= J Q(xi,y)dy + J ' Q(x 2 ,y)dy = (f> Qdy. 


Subtracting these two results gives Green’s theorem in a plane. 


► S/iow that the area of a region R enclosed by a simple closed curve C is given by A = 
I f c (xdy—y dx) = f c xdy = — j> c y dx. Hence calculate the area of the ellipse x = acoscj). 
y = b sin <j>. 


In Green’s theorem (11.4) put P = —v and Q = x; then 


® (xdy — y dx) = / / (1 + 1 )dxdy = 2 dxdy = 2 A. 

Jc J J r J Jr 


Therefore the area of the region is A = I § c (xdy—y dx). Alternatively, we could put P = 0 
and 0 = x and obtain A = j> c x dy, or put P = —y and Q = 0, which gives A = — j> c y dx. 
The area of the ellipse x = a cos <f>, y = b sin (f> is given by 

1 / 1 f ! * , , 

A = - (b (xdy — y dx) = - / ab( cos <j> + sin - f) d(j> 

2 Jc 2 Jo 

d(f> = nab. ◄ 



It may further be shown that Green’s theorem in a plane is also valid for 
multiply connected regions. In this case, the line integral must be taken over 
all the distinct boundaries of the region. Furthermore, each boundary must be 
traversed in the positive direction, such that a person travelling along it in this 
direction always has the region R on their left. In order to apply Green’s theorem 
to the region R shown in figure 11.4, the line integrals must be taken over 
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.v 



x 

Figure 11.4 A doubly connected region R bounded by the curves C i and C 2 - 


both boundaries, Ci and C 2 , in the directions indicated, and the results added 
together. 

We may also use Green’s theorem in a plane to investigate the path indepen- 
dence (or not) of line integrals when the paths lie in the xy-plane. Let us consider 
the line integral 


1=1 (P dx + Qdy). 


For the line integral from A to B to be independent of the path taken, it must 
have the same value along any two arbitrary paths C\ and C 2 joining the points. 
Moreover, if we consider as the path the closed loop C formed by Ci — C 2 then 
the line integral around this loop must be zero. From Green’s theorem in a plane, 
(11.4), we see that a sufficient condition for / = 0 is that 


dP_ _ dQ 

dy dx ’ 


(11.5) 


throughout some simply connected region R containing the loop, where we assume 
that these partial derivatives are continuous in R. 

It may be shown that (11.5) is also a necessary condition for / = 0 and is 
equivalent to requiring P dx + Q dy to be an exact differential of some function 
<f>(x, y) such that P dx + Qdy = deft. It follows that J ^ (P dx + Q dy) = (f>(B ) — cj)(A ) 
and that § C (P dx + Qdy ) around any closed loop C in the region R is identically 
zero. These results are special cases of the general results for paths in three 
dimensions, which are discussed in the next section. 
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► Evaluate the line integral 

I = j) [(e x y + cos x sin y) dx + (e x + sin x cos y ) dy ] , 
around the ellipse x 2 /a 2 +y 2 /b 2 = 1. 

Clearly, it is not straightforward to calculate this line integral directly. However, if we let 

P = e x y + cos x sin y and Q = e* + sin x cos y, 

then 8P /dy = e x + cos x cosy = 8Q/dx, and so P dx + Qdy is an exact differential (it 
is actually the differential of the function f(x,y) = e x y + sin x sin y). From the above 
discussion, we therefore immediately conclude that / = 0. ◄ 


11.4 Conservative fields and potentials 

So far we have made the point that, in general, the value of a line integral 
between two points A and B depends on the path C taken from A to B. In the 
previous section, however, we saw that, for paths in the xy-plane, line integrals 
whose integrands have certain properties are independent of the path taken. We 
now extend that discussion to the full three-dimensional case. 

For line integrals of the form f c a • dr, there exists a class of vector fields for 
which the line integral between two points is independent of the path taken. Such 
vector fields are called conservative. A vector field a that has continuous partial 
derivatives in a simply connected region R is conservative if, and only if, any of 
the following is true. 

(i) The integral j f a • dr, where A and B lie in the region R , is independent of 
the path from A to B. Hence the integral <L a ■ dr around any closed loop 
in R is zero. 

(ii) There exists a single-valued function 0 of position such that a = V(/>. 

(iii) V x a = 0. 

(iv) a • dr is an exact differential. 


The validity or otherwise of any of these statements implies the same for the 
other three, which we will now show. 

First, let us assume that (i) above is true. If the line integral from A to B 
is independent of the path taken between the points then its value must be a 
function only of the positions of A and B. We may therefore write 

/ a • dr = (f>(B) - <j>(A), (11.6) 


which defines a single-valued scalar function of position </>. If the points A and B 
are separated by an infinitesimal displacement dr then (11.6) becomes 


a • dr = d(f>. 
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which shows that we require a • dr to be an exact differential: condition (iv). From 
(10.27) we can write dcj) = V(/> • dr, and so we have 

(a — V</>) • dr = 0. 

Since dr is arbitrary, we find that a = V0; this immediately implies V x a = 0, 
condition (iii) (see (10.37)). 

Alternatively, if we suppose that there exists a single-valued function of position 
</> such that a = V0 then V x a = 0 follows as before. The line integral around a 
closed loop then becomes 

® a • dr = ® V</> ■ dr — (f> d(j>. 

Jc Jc J 

Since we defined tfi to be single-valued, this integral is zero as required. 

Now suppose V x a = 0. From Stoke’s theorem, which is discussed in sec- 
tion 11.9, we immediately obtain <f c a ■ dr = 0 ; then a = Vtfi and a • dr = d(f> follow 
as above. 

Finally, let us suppose a • dr = dcj). Then immediately we have a = V0, and the 
other results follow as above. 


► Evaluate the line integral I = a ■ dr, where a = (xy (i) 2 + z)i + (x 2 y + 2)j + xk, A is the 
point (c,c,h) and B is the point ( 2c, c/2, h ), along the different paths 

(i) Ci, given by x = cu, y = c/u, z = h, 

( ii ) C 2 , given by 2y = 3c — x, z = h. 

Show that the vector field a is in fact conservative, and find (j> such that a = V<j>. 


Expanding out the integrand, we have 


/ = 


M2c,c/2,h) 

>(c,c,h) 


[(xy 2 + z) dx + (x 2 y + 2) dy + x dz] , 


(11.7) 


which we must evaluate along each of the paths Ci and Ct. 

(i) Along Ci we have dx = cdu, dy = —(c/u 2 ) du, dz = 0, and on substituting in (11.7) 
and finding the limits on u, we obtain 

I = [ c ( h — — ^ j du = c(h — 1). 

J t V « 2 / 


(ii) Along C 2 we have 2 dy = —dx, dz = 0 and, on substituting in (11.7) and using the 
limits on x, we obtain 


I = 


' 1 „3 9 2 , 9 „2 


+ \c 2 x + h — 1) dx = c(h — 1 ). 


Hence the line integral has the same value along paths Ci and C 2 . Taking the curl of a, 
we have 

V x a = (0 — 0)i T ( 1 — l)j T (2 xy — 2xy)k = 0, 


so a is a conservative vector field, and the line integral between two points must be 
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independent of the path taken. Since a is conservative, we can write a 
must satisfy 


d4> 

dx 


T 2 | 

— = xy + z, 


V<j). Therefore, 4> 


which implies that <j) = f x 2 y 2 + zx + f(y,z) for some function /. Secondly, we require 

2 , 8 f 2 , - 

— = x y + — = x y + 2, 
oy oy 

which implies / = 2y + g(z). Finally, since 


8z ' 8z 


= x. 


we have g = constant = k. It can be seen that we have explicitly constructed the function 
cj> = jX 2 y 2 + zx + 2y + k. ◄ 


The quantity f that figures so prominently in this section is called the scalar 
potential function of the conservative vector field a (which satisfies V x a = 0), and 
is unique up to an arbitrary additive constant. Scalar potentials that are multi- 
valued functions of position (but in simple ways) are also of value in describing 
some physical situations, the most obvious example being the scalar magnetic 
potential associated with a current-carrying wire. When the integral of a field 
quantity around a closed loop is considered, provided the loop does not enclose 
a net current, the potential is single-valued and all the above results still hold. If 
the loop does enclose a net current, however, our analysis is no longer valid and 
extra care must be taken. 

If, instead of being conservative, a vector field b satisfies V ■ b = 0 (i.e. b 
is solenoidal) then it is both possible and useful, for example in the theory of 
electromagnetism, to define a vector potential field a such that b = V x a. It may 
be shown that such a vector field a always exists. Further, if a is one such vector 
field then a' = a + Vtp + c, where xp is any scalar function and c is any constant 
vector, also satisfies the above relationship, i.e. b = V x a'. This was discussed 
more fully in subsection 10.8.2. 


11.5 Surface integrals 


As with line integrals, integrals over surfaces can involve vector and scalar fields 
and, equally, can result in either a vector or a scalar. The simplest case involves 
entirely scalars and is of the form 

J <j>dS. (11.8) 

As analogues of the line integrals listed in (11.1), we may also encounter surface 
integrals involving vectors, namely 

j <j>dS, ladS, I ax dS. (11.9) 

J S S s 
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Figure 11.5 (a) A closed surface and (b) an open surface. In each case a 

normal to the surface is shown: dS = hdS. 


All the above integrals are taken over some surface S, which may be either 
open or closed, and are therefore, in general, double integrals. Following the 
notation for line integrals, for surface integrals over a closed surface f s is replaced 

b y is- 

The vector differential dS in (11.9) represents a vector area element of the 
surface S. It may also be written dS = hdS, where n is a unit normal to the 
surface at the position of the element and dS is the scalar area of the element used 
in (11.8). The convention for the direction of the normal n to a surface depends 
on whether the surface is open or closed. A closed surface, see figure 11.5(a), 
does not have to be simply connected (for example, the surface of a torus is not), 
but it does have to enclose a volume V, which may be of infinite extent. The 
direction of n is taken to point outwards from the enclosed volume as shown. 
An open surface, see figure 11.5(h), spans some perimeter curve C. The direction 
of n is then given by the right-hand sense with respect to the direction in which 
the perimeter is traversed, i.e. follows the right-hand screw rule discussed in 
section 7.6.2. An open surface does not have to be simply connected but for 
our purposes it must be two-sided (a Mobius strip is an example of a one-sided 
surface). 

The formal definition of a surface integral is very similar to that of a line 
integral. We divide the surface S into N elements of area A S p , p = 1 , ...,1V, each 
with a unit normal n p . If (x p ,y p ,z p ) is any point in A S p then the second type of 
surface integral in (11.9), for example, is defined as 

N 

/ a ■ dS = lim V'a (x p ,y p ,z p ) ■ n„A S p , 

/ c N—> oo z ' 

p= 1 

where it is required that all A S p — * 0 as N — > oo. 
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Figure 11.6 A surface S (or part thereof) projected onto a region R in the 
xy-plane; dS is the surface element at a point P. 


11.5.1 Evaluating surface integrals 


We now consider how to evaluate surface integrals over some general surface. This 
involves writing the scalar area element dS in terms of the coordinate differentials 
of our chosen coordinate system. In some particularly simple cases this is very 
straightforward. For example, if S is the surface of a sphere of radius a (or some 
part thereof) then using spherical polar coordinates 6, <f> on the sphere we have 
dS = a 2 sin 6 d9 d(f>. For a general surface, however, it is not usually possible to 
represent the surface in a simple way in any particular coordinate system. In such 
cases, it is usual to work in Cartesian coordinates and consider the projections of 
the surface onto the coordinate planes. 

Consider a surface (or part of a surface) S as in figure 11.6. The surface S is 
projected onto a region R of the xv-plane, so that an element of surface area dS 
at point P projects onto the area element dA. From the figure, we see that dA = 

| cosa| dS, where a is the angle between the unit vector k in the z-direction and 
the unit normal n to the surface at P. So, at any given point of S, we have simply 

dA dA 
|cosa| |n - k| ’ 


Now, if the surface S is given by the equation f(x,y,z) — 0 then, as shown in 
subsection 10.7.1, the unit normal at any point of the surface is simply given by 
n = V//|V/| evaluated at that point, cf. (10.32). The scalar element of surface 
area then becomes 


dA _ |V/| dA _ |V/| dA 
|n • k| V/ • k df/dz ’ 


( 11 . 10 ) 
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where |V/| and 8f /dz are evaluated on the surface S. We can therefore express 
any surface integral over S as a double integral over the region R in the xv- plane. 


► Evaluate the surface integral I = L a ■ d S, where a = xi and S is the surface of the 
hemisphere x 2 + y 2 + z 2 = a 2 with z > 0. 


The surface of the hemisphere is shown in figure 11.7. In this case dS may be easily 
expressed in spherical polar coordinates as dS = a 2 sin 9 d6d<f>, and the unit normal to the 
surface at any point is simply r. On the surface of the hemisphere we have x = a sin 9 cos (j> 
and so 

a • dS = x (i ■ r) dS = (a sin 9 cos <f>)(sin 9 cos 4>)(a 2 sin 8 d9 dp). 


Therefore, inserting the correct limits on 8 and p, we have 

9 8 % ^ 2 f 2n 2 tto 3 

I = / a ■ dS = a 3 / d8 sin 3 8 d<f> cos 2 p = — - — . 

Js Jo Jo 3 

We could, however, follow the general prescription above and project the hemisphere S 
onto the region R in the xy-plane that is a circle of radius a centred at the origin. Writing 
the equation of the surface of the hemisphere as f(x,y) = x 2 + y 2 + z 2 — a 2 =0 and using 
(11.10), we have 


/ = a dS= / x(i r) dS 



IV/I dA 

df/dz ' 


Now V/ = 2xi + 2yj + 2rk = 2r, so on the surface S we have |V/| = 2|r| = 2a. On S we 
also have df /dz = 2 z = 2 a/ a 2 — x 2 — y 2 and i • r = x/a. Therefore, the integral becomes 


/ = 


//. 


\f a 


: dx dy. 


Although this integral may be evaluated directly, it is quicker to transform to plane polar 
coordinates: 


I = 


!L 


p 2 COS 2 4> 
s/a 2 - p 2 

cos 2 <j) dcj> I 


p dp dip 
ra p 3 dp 


s/a 2 - p 2 

Making the substitution p = usinu, we finally obtain 


I = 


cos 2 cj> dtp 


rn/2 


a 3 sin 3 udu = 


2na 3 


In the above discussion we assumed that any line parallel to the z-axis intersects 
S only once. If this is not the case, we must split up the surface into smaller 
surfaces Si, Sj etc. that are of this type. The surface integral over S is then the 
sum of the surface integrals over Si, S 2 and so on. This is always necessary for 
closed surfaces. 

We may also sometimes wish to project a surface S (or some part of it) onto 
the zx- or yz-plane, rather than the .xy-plane. In such cases, the above analysis is 
easily modified. 
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Figure 11.7 The surface of the hemisphere x 2 + y 2 + z 2 = a 2 , z > 0. 


11.5.2 Vector areas of surfaces 
The vector area of a surface S is defined simply as 

S = J dS, 

where the surface integral may be evaluated as above. 


► Find the vector area of the surface of the hemisphere x 2 + y 2 + z 2 = a 2 with z > 0. 


As in the previous example, dS = a 2 sin 8 dd d<f>i in spherical polar coordinates. Therefore 
the vector area is given by 


S = 


a 2 sind id0 dcj). 


Now, since r varies over the surface S, it also must be integrated. This is most easily 
achieved by writing r in terms of the constant Cartesian basis vectors. On S we have 


r = sin 6 cos 4> i + sin 9 sin <j> j + cos 8 k, 
so the expression for the vector area becomes 



= 0 + 0 + na 2 k = na 2 k. 


Note that the magnitude of S is the projected area, of the hemisphere onto the xv-plane, 
and not the surface area of the hemisphere. ◄ 
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Figure 11.8 The conical surface spanning the perimeter C and having its 
vertex at the origin. 


The hemispherical shell discussed above is an example of an open surface. For 
a closed surface, however, the vector area is always zero. This may be seen by 
projecting the surface down onto each Cartesian coordinate plane in turn. For 
each projection, every positive element of area on the upper surface is cancelled 
by the corresponding negative element on the lower surface. Therefore, each 
component of S = j> s d S vanishes. 

An important corollary of this result is that the vector area of an open surface 
depends only on its perimeter, or boundary curve, C. This may be proved as 
follows. If surfaces Si and S 2 have the same perimeter then Si — S 2 is a closed 
surface, for which 


dS = / dS- / dS = 0. 


I Si 


is 2 


Hence Si = S 2 . Moreover, we may derive an expression for the vector area of 
an open surface S solely in terms of a line integral around its perimeter C. 
Since we may choose any surface with perimeter C, we will consider a cone 
with its vertex at the origin (see figure 11.8). The vector area of the elementary 
triangular region shown in the figure is dS = jrx dr. Therefore, the vector area 
of the cone, and hence of any open surface with perimeter C, is given by the line 
integral 


S = - ® r x dr. 

2 Jc 

For a surface confined to the xy -plane, r = xi + yj and dr = dxi + dy j, 
and we obtain for this special case that the area of the surface is given by 
A = j f c (x dy — y dx), as we found in section 1 1.3. 
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>-Find the vector area of the surface of the hemisphere x 2 + y 2 + z 2 = a 2 , z > 0, by 
evaluating the line integral S = f c r x dr around its perimeter. 


The perimeter C of the hemisphere is the circle x 2 + y 2 = a 2 , on which we have 
r = a cos <j) i + a sin <j> j, dr = —a sin f d<f> i + a cos f df j. 
Therefore the cross product r x dr is given by 


i j k 

r x dr = a cost/) asiruj) 0 = a 2 (cos 2 <fi + sin 2 fidf k = a 2 df k, 

— a sin fdtj) acosfdcj) 0 


and the vector area becomes 


S = fa 2 k / df = 7icrk. ◄ 


11.5.3 Physical examples of surface integrals 


There are many examples of surface integrals in the physical sciences. Surface 
integrals of the form (11.8) occur in computing the total electric charge on a 
surface or the mass of a shell, p(r) dS, when the charge or mass density p(r)is 
known. For surface integrals involving vectors, the second form in (11.9) is the 
most common. For a vector held a, the surface integral f s a • dS is called the flux 
of a through S. Examples of physically important flux integrals are numerous. 
For example, let us consider a surface S in a fluid with density p( r) that has a 
velocity field v(r). The mass of fluid crossing an element of surface area dS in 
time dt is dM = px ■ dS dt. Therefore the net total mass flux of fluid crossing S 
is M = p(r)v(r) • dS. As a another example, the electromagnetic flux of energy 
out of a given volume V bounded by a surface S is j>„(E x H) • dS. 

The solid angle, to be defined below, subtended at a point O by a surface (closed 
or otherwise) can also be represented by an integral of this form, although it is 
not strictly a flux integral (unless we imagine isotropic rays radiating from 0 ). 
The integral 


Q = 


f r ■ dS 

Js r 3 



( 11 . 11 ) 


gives the solid angle Q subtended at 0 by a surface S if r is the position vector 
measured from O of an element of the surface. A little thought will show that 
(11.11) takes account of all three relevant factors: the size of the element of 
surface, its inclination to the line joining the element to 0 and the distance from 
0. Such a general expression is often useful for computing solid angles when the 
three-dimensional geometry is complicated. Note that (11.11) remains valid when 
the surface S is not convex and when a single ray from O in certain directions 
would cut S in more than one place (but we exclude multiply connected regions). 


401 




LINE, SURFACE AND VOLUME INTEGRALS 


In particular, when the surface is closed Q = 0 if 0 is outside S and Q = An if 0 
is an interior point. 

Surface integrals resulting in vectors occur less frequently. An example is 
afforded, however, by the total resultant force experienced by a body immersed in 
a stationary fluid in which the hydrostatic pressure is given by p(r). The pressure 
is everywhere inwardly directed and the resultant force is F — — § S P dS, taken 
over the whole surface. 


11.6 Volume integrals 

Volume integrals are defined in an obvious way and are generally simpler than 
line or surface integrals since the element of volume dV is a scalar quantity. We 
may encounter volume integrals of the form 

[ 4>dV, [ a dV. (11.12) 

Jv Jv 

Clearly, the first form results in a scalar, whereas the second form yields a vector. 
Two closely related physical examples, one of each kind, are provided by the total 
mass of a fluid contained in a volume V, given by f y p( r) dV, and the total linear 
momentum of that same fluid, given by f y p(r)v(r) dV, where v(r) is the velocity 
field in the fluid. As a slightly more complicated example of a volume integral we 
may consider the following. 

►Find an expression for the angular momentum of a solid body rotating with angular 
velocity m about an axis through the origin. 


Consider a small volume element dV situated at position r; its linear momentum is pdVi, 
where p = p(r) is the density distribution, and its angular momentum about O is r x pidV. 
Thus for the whole body the angular momentum L is 

L = Mr x i)pdV . 

Jv 

Putting r = to x r yields 

L = / [r x {w x t)} p dV = / mr 2 pdV — / (r • co)rp dV. ◄ 

Jv J V Jv 

The evaluation of the first type of volume integral in (11.12) has already been 
considered in our discussion of multiple integrals in chapter 6. The evaluation of 
the second type of volume integral follows directly since we can write 

/ adV = i a x dV + j/ a y dV + k/ a : dV, (11.13) 

Jv Jv Jv Jv 

where a x , a y , a z are the Cartesian components of a. Of course, we could have 
written a in terms of the basis vectors of some other coordinate system (e.g. 
spherical polars) but, since such basis vectors are not, in general, constant, they 
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dS 



Figure 11.9 A general volume V containing the origin and bounded by the 
closed surface S. 

cannot be taken out of the integral sign as in (11.13) and must be included as 
part of the integrand. 

11.6.1 Volumes of three-dimensional regions 

As discussed in chapter 6, the volume of a three-dimensional region V is simply 
V = f v dV, which may be evaluated directly once the limits of integration have 
been found. However, the volume of the region obviously depends only on the 
surface S that bounds it. We should therefore be able to express the volume V 
in terms of a surface integral over S. This is indeed possible, and the appropriate 
expression may derived as follows. Referring to figure 11.9, let us suppose that 
the origin O is contained within V. The volume of the small shaded cone is 
dV = jr • dS; the total volume of the region is thus given by 

It may be shown that this expression is still valid even when 0 is not contained 
in V. Although this surface integral form is available, in practice, in many cases 
it is simpler to evaluate the volume integral directly. 

>-Find the volume enclosed between a sphere of radius a centred on the origin and a circular 
cone of half-angle a with its vertex at the origin. 


The element of vector area dS on the surface of the sphere is given in spherical polar 
coordinates by a 2 sin 9 dd d(f>i. Now taking the axis of the cone to lie along the z-axis (from 
which 6 is measured) the required volume is given by 

V=l t <£rdS=l l f d(f> f a 2 sin 9 r ■ i d8 
3 Js 2 Jo Jo 

1 f 2n /■“ 2 

= - / dc/) fsm6dd=^na\l~cosoc). ◄ 

3 Jo Jo 3 
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11.7 Integral forms for grad, div and curl 

In the previous chapter we defined the vector operators grad, div and curl in purely 
mathematical terms, which depended on the coordinate system in which they were 
expressed. An interesting application of line, surface and volume integrals is the 
expression of grad, div and curl in coordinate-free, geometrical terms. If <j> is a 
scalar field and a is a vector field then it may be shown that at any point P 

< 1U4 > 

v "®) < 1U5 > 

V x a = lim j> dS x aj (11.16) 

where V is a small volume enclosing P and S is its bounding surface. Indeed, 
we may consider these equations as the (geometrical) definitions of grad, div and 
curl. An alternative, but equivalent, geometrical definition of V x a at a point P, 
which is often easier to use than (11.16), is given by 

(V x a) • n = lim ^ j) a • dr'j , (11.17) 

where C is a plane contour of area A enclosing the point P and n is the unit 
normal to the enclosed planar area. 

It may be shown, in any coordinate system, that all the above equations are 
consistent with our definitions in the previous chapter although the difficulty of 
proof depends on the chosen coordinate system. The most general coordinate 
system encountered in that chapter was one with orthogonal curvilinear coordi- 
nates mi,« 2 ,H 3 , of which Cartesians, cylindrical polars and spherical polars are all 
special cases. Although it may be shown that (11.14) leads to the usual expression 
for grad in curvilinear coordinates, the proof requires complicated manipulations 
of the derivatives of the basis vectors with respect to the coordinates and is not 
presented here. In Cartesian coordinates, however, the proof is quite simple. 


► S/iow that the geometrical definition of grad leads to the usual expression for Vf> in 
Cartesian coordinates. 


Consider the surface S of a small rectangular volume element AT = AxAyAz that has its 
faces parallel to the x, y, and 2 coordinate surfaces and the point P at one corner. We 
must calculate the surface integral (11.14) over each of its six faces. Remembering that the 
normal to the surface points outwards from the volume on each face, the two faces with 
x = constant have areas AS = — i Ay A z and AS = i Ay Az respectively. Furthermore, over 
each small surface element, we may take (f> to be constant, so that the net contribution to 
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the surface integral from these two faces is then 

[(4> + A (j>) — <j>] Ay A ,z i = Ax — Ay A" i 

= ^ Ax Ay A .z i. 

ox 

The surface integral over the pairs of faces with y = constant and z = constant respectively 
may be found in a similar way, and we obtain 

J H S=( e * i+ S Jt i+ d * k ) 4*4, Az. 


Js 


dx ' 


sy 


8z 


Therefore Vf at the point P is given by 

1 


V<j> = lim 

Ax,Ay,Az— »0 


84> . 8f . 84> . 

a a a i-t'+tJ+fM A^AyAz 

Ax Ay Az \ dx 8y 8z 1 


8<t> . dtp .,8<t> 

~^ 1+ 8j S + L ^ 


We now turn to (11.15) and (11.17). These geometrical definitions may be 
shown straightforwardly to lead to the usual expressions for div and curl in 
orthogonal curvilinear coordinates. 


► By considering the infinitesimal volume element dV = hih 2 h 2 Am Au 2 Au 3 shown in fig- 
ure 11.10, show that ( 11.15 ) leads to the usual expression for Va in orthogonal curvilinear 
coordinates. 


Let us write the vector field in terms of its components with respect to the basis vectors 
of the curvilinear coordinate system as a = a 3 ei + a 2 e 2 + a 3 e 3 . We consider first the 
contribution to the RHS of () from the two faces with u t = constant, i.e. PQRS and the 
face opposite it (see figure 11.10). Now, the volume element is formed from the orthogonal 
vectors hi Ain ei, h 2 Ain e? and /i 3 Am 3 e 3 , at the point P and so for we have 

AS = h 2 h 3 A i <2 Au 3 e 3 x e 2 = —h 2 h 2 A u 2 Ak 3 e 3 . 


Reasoning along the same lines as in the previous example, we conclude that the contri- 
bution to the surface integral of a ■ dS over PQRS and its opposite face taken together is 
given by 

- — (a- AS) Ain = -r — (aih 2 h 2 )Aui Au 2 Aw 3 . 

OU\ OU\ 

The surface integrals over the pairs of faces with u 2 = constant and u 3 = constant 
respectively may be found in a similar way, and we obtain 


a ■ dS = 


yp-(aih 2 h } ) + -^—(afifiii) + -^—(afiififi 
OU\ OU2 (/W3 


Aui A u 2 A u 3 . 


Therefore V ■ a at the point P is given by 


V ■ a = lim 

A«i,A«2,Au3— ►() 


1 


h\h 2 h 2 Aui A u 2 Au 2 


a ■ dS 


1 


hih 2 h 2 


~^—(aih 2 h 2 ) + -^-(a 2 h 3 hi)+ ^—{a 2 l nh 2 ) 
CU\ du.2 CUt, 
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Figure 11.10 A general volume AV in orthogonal curvilinear coordinates 
u\,u 2 ,u 3. PT gives the vector hiAuieu PS gives h 2 Au 2 t 2 and PQ gives 
/?3 Au 2 63. 


►By considering the infinitesimal planar surface element PQRS in figure 11.10, show that 
(11.17) leads to the usual expression for V x a in orthogonal curvilinear coordinates. 


The planar surface P QRS is defined by the orthogonal vectors h 2 A u 2 e 2 and h Au 2 £3 
at the point P. If we traverse the loop in the direction PSRQ then, by the right-hand 
convention, the unit normal to the plane is ei. Writing a = «iei + a 2 e 2 + a 2 e 2 , the line 
integral around the loop in this direction is given by 


a ■ dr = a 2 h 2 A u 2 + 


PSRQ 


ash + ^ — (03/13) A u 2 
OU2 


a 2 h 2 T — — (a 2 h 2 ) Aii 2 

0U3 

-T- (U3/13) -S—{a 2 h 2 ) 

ou 2 ou 2 


A»3 

A«2 — <33/13 All 2 


Au 2 Ai/3. 


Therefore from (11.17) the component of V x a in the direction e t at P is given by 

1 


(V x a)i = lim 


a ■ dr 


J-(ha 3 ) - 4 ~(h 2 a 2 ) 

OU2 UU3 


Ah 2 ,A«3-»0 \_h2h3 AU2 Aut, Jpsrq 
1 

hh 

The other two components are found by cyclically permuting the subscripts 1, 2, 3. ◄ 

Finally, we note that we can also write the V 2 operator as a surface integral by 
setting a = V</> in (11.15), to obtain 

V 2 </> = V • Vfi = lim ( ^ <f> V(t> ■ dS 
F— >0 V V 
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11.8 Divergence theorem and related theorems 


The divergence theorem relates the total flux of a vector field out of a closed 
surface S to the integral of the divergence of the vector field over the enclosed 
volume V ; it follows almost immediately from our geometrical definition of 
divergence (11.15). 

Imagine a volume V, in which a vector field a is continuous and differentiable, 
to be divided up into a large number of small volumes V,-. Using (11.15), we have 
for each small volume 

(V • a) Vj « j) a • dS, 

where .S’, is the surface of the small volume V,-. Summing over i we find that 
contributions from surface elements interior to S cancel since each surface element 
appears in two terms with opposite signs, the outward normals in the two terms 
being equal and opposite. Only contributions from surface elements that are also 
parts of S survive. If each V,- is allowed to tend to zero then we obtain the 
divergence theorem, 

J V • adV = jf a • hS. (11.18) 

We note that the divergence theorem holds for both simply and multiply con- 
nected surfaces, provided that they are closed and enclose some non-zero volume 
V. The divergence theorem may also be extended to tensor fields (see chapter 21). 

The theorem finds most use as a tool in formal manipulations, but sometimes it 
is of value in evaluating surface integrals of the form f s a-dS as volume integrals 
or vice versa. For example, setting a = r we immediately obtain 


V • rdV = / 3dV = 3V 


r • d S, 


Jv Jv Js 

which gives the expression for the volume of a region found in subsection 11.6.1. 
The use of the divergence theorem is further illustrated in the following example. 


► Evaluate the surface integral I = f s a ■ d S, where a = (y — x) i + x 2 z j + (z + x 2 ) k and S 
is the open surface of the hemisphere x 2 + y 2 + z 1 = a 2 , z > 0. 


We could evaluate this surface integral directly, but the algebra is somewhat lengthy. We 
will therefore evaluate it by use of the divergence theorem. Since the latter only holds 
for closed surfaces enclosing a non-zero volume V, let us first consider the closed surface 
S' = S + Si, where Si is the circular area in the xy-plane given by x 2 + y 2 < a 2 , z = 0; S' 
then encloses a hemispherical volume V. By the divergence theorem we have 

f V a dV = <f adS= f a- dS+ [ a • dS. 

Jv J S' Js J Si 

Now V ■ a = — 1 + 0 + 1 = 0, so we can write 


f a • dS = - f 
Js Js i 


a ■ dS. 
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Figure 11.11 A closed curve C in the .xy-plane bounding a region R. Vectors 
tangent and normal to the curve at a given point are also shown. 


The surface integral over Si is easily evaluated. Remembering that the normal to the 
surface points outward from the volume, a surface element on Si is simply dS = — k dx dy. 
On Si we also have a = (y — x) i + x 2 k, so that 


I = — a • dS = / / x 2 dx dy, 

Js t J Jr 

where R is the circular region in the .xv-plane given by x 2 +y 2 < a 2 . Transforming to plane 
polar coordinates we have 


/ = 



p 2 cos 2 4> pdpdcj) 




4 


. ◄ 


It is also interesting to consider the two-dimensional version of the divergence 
theorem. As an example, let us consider a two-dimensional planar region R in 
the xy-plane bounded by some closed curve C (see figure 11.11). At any point 
on the curve the vector dr = dxi + dyj is a tangent to the curve and the vector 
nds = dy i — dxj is a normal pointing out of the region R. If the vector field a is 
continuous and differentiable in R then the two-dimensional divergence theorem 
in Cartesian coordinates gives 


[ [ ( — — + — ^ | dxdy = (f a • nds = (f (a x dy — a v dx). 

JJ R \8x dy J J J c 

Letting P = —a y and 0 = a x , we recover Green’s theorem in a plane, which was 
discussed in section 11.3. 


11.8.1 Green’s theorems 

Consider two scalar functions 0 and xp that are continuous and differentiable in 
some volume V bounded by a surface S. Applying the divergence theorem to the 
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vector field cjtVip we obtain 


<j)Vxp-dS= / V • (c/)Vip)dV 

Jv 

= / |>V 2 </> + (V0) • (Vip)] dV. 

Jv 


(11.19) 

Reversing the roles of </; and i ip in (11.19) and subtracting the two equations gives 


^"((/>Vi p — tpV</>) • dS = ^(0V 2 ip — yV 2 (j))dV. 


( 11 . 20 ) 


Equation (11.19) is usually known as Green’s first theorem and (11.20) as his 
second. Green’s second theorem is useful in the development of the Green’s 
functions used in the solution of partial differential equations (see chapter 19). 


11.8.2 Other related integral theorems 


There exist two other integral theorems which are closely related to the divergence 
theorem and which are of some use in physical applications. If 4> is a scalar field 
and b is a vector field and both </> and b satisfy our usual differentiability 
conditions in some volume V bounded by a closed surface S then 



( 11 . 21 ) 


V x bdV = ® dS x b. 


( 11 . 22 ) 


► Gse the divergence theorem to prove ( 11.21 ). 


In the divergence theorem (11.18) let a = cj> c, where c is a constant vector. We then have 


J V -(<t>c)dv = j^c- dS. 


Expanding out the integrand on the LHS we have 

V ■ (<j) c) = 0V ■ c + e ■ V<j> = c ■ Vcj), 
since c is constant. Also, <f>c ■ dS = c ■ cj)dS. so we obtain 

J c- [V<t>)dV = c- <l>dS. 


Since c is constant we may take it out of both integrals to give 


c ■ J VtfidV = c • j) (j> dS, 

and since e is arbitrary we obtain the stated result (11.21). ◄ 

Equation (11.22) may be proved in a similar way by letting a = b x c in the 
divergence theorem, where c is again a constant vector. 
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11.8.3 Physical applications of the divergence theorem 

The divergence theorem is useful in deriving many of the most important partial 
differential equations in physics (see chapter 18). The basic idea is to use the 
divergence theorem to convert an integral form, often derived from observation, 
into an equivalent differential form (used in theoretical statements). 


►Tor a compressible fluid with time-varying position-dependent density p(i,t) and velocity 
field r(r, f), in which fluid is neither being created nor destroyed, show that 


For an arbitrary volume V in the fluid, the conservation of mass tells us that the rate of 
increase or decrease of the mass M of fluid in the volume must equal the net rate at which 
fluid is entering or leaving the volume, i.e. 


dM 

dt 


— ® p\ ■ dS, 


where S is the surface bounding V. But the mass of fluid in V is simply M = f y pdV, so 
we have 

it! v pdv+ fs p ' , ' ds=o - 

Taking the derivative inside the first integral on the RHS and using the divergence theorem 
to rewrite the second integral, we obtain 


IM iv+ !f {py)dv -L 


Sfi 

dt 


+ V- (pv) 


dV = 0. 


Since the volume V is arbitrary, the integrand (which is assumed continuous) must be 
identically zero, so we obtain 

!+V.(pv) = 0. 

This is known as the continuity equation. It can also be applied to other systems, for 
example those in which p is the density of electric charge or the heat content, etc. In the 
flow of an incompressible fluid p = constant and the continuity equation becomes simply 
V • v = 0. ◄ 


In the previous example, we assumed that there were no sources or sinks in 
the volume V, i.e. that there was no part of V in which fluid was being created 
or destroyed. We now consider the case where a finite number of point sources 
and/or sinks are present in an incompressible fluid. Let us first consider the 
simple case where a single source is located at the origin, out of which a quantity 
of fluid flows radially at a rate Q (m 3 s _1 ). The velocity field is given by 

v = = -9L 

4nr 3 47ir 2 ’ 

Now, for a sphere Si of radius r centred on the source, the flux across Si is 

(j) v • dS = \y\4nr 2 = Q. 

Js, 
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Since v has a singularity at the origin it is not differentiable there, i.e. V • v is not 
defined there, but at all other points V • v = 0, as required for an incompressible 
fluid. Therefore, from the divergence theorem, for any closed surface S 2 that does 
not enclose the origin we have 

® v • dS = f V • \dV = 0. 

Js 2 Jv 

Thus we see that the surface integral f s v • dS has value Q or zero depending on 
whether or not S encloses the source at the origin. In order that the divergence 
theorem is valid for all surfaces S, irrespective of whether they enclose the source, 
we write 

V • v = £><5(r), 

where <5(r) is the three-dimensional Dirac delta function. The properties of this 
function are discussed fully in chapter 13, but for the moment we note that it is 
defined in such a way that 

d(r — a) = 0 for r ^ a, 


[ /(r)d(r 
Jv 


a )dV = 



if a lies in V 
otherwise 


for any well-behaved function /( r). Therefore, for any volume V containing the 
source at the origin, we have 


[ V-vdV = Q [ 8(r )dV = Q, 

Jv Jv 

which is consistent with j> s v • dS = Q for a closed surface enclosing the source. 
Hence, by introducing the Dirac delta function the divergence theorem can be 
made valid even for non-differentiable point sources. 

The generalisation to several sources and sinks is straightforward. For example, 
if a source is located at r = a and a sink at r = b then the velocity field is 

= (r-a)Q _ (r-b)Q 
47r|r — a| 3 47i|r— b| 3 

and its divergence is given by 


V • v = QS(r - a) - Q8 ( r - b). 

Therefore, the integral f s v • dS has the value Q if S encloses the source, — Q if 
S encloses the sink and 0 if S encloses neither the source nor sink or encloses 
them both. This analysis also applies to other physical systems - for example, in 
electrostatics we can regard the sources and sinks as positive and negative point 
charges respectively and replace v by the electric field E. 
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11.9 Stokes’ theorem and related theorems 

Stokes’ theorem is the ‘curl analogue’ of the divergence theorem and relates the 
integral of the curl of a vector field over an open surface S to the line integral of 
the vector field around the perimeter C bounding the surface. 

Following the same lines as for the derivation of the divergence theorem, we 
can divide the surface S into many small areas S; with boundaries C, and unit 
normals n,. Using (11.17), we have for each small area 

(V x a) • n, Sj « j) a dr. 

Summing over i we find that on the RFIS all parts of all interior boundaries 
that are not part of C are included twice, being traversed in opposite directions 
on each occasion and thus contributing nothing. Only contributions from line 
elements that are also parts of C survive. If each .S’, is allowed to tend to zero 
then we obtain Stokes’ theorem. 


[(Vxa)-dS = (fa dr. (11.23) 

Js Jc 

We note that Stokes’ theorem holds for both simply and multiply connected open 
surfaces, provided that they are two-sided. Stokes’ theorem may also be extended 
to tensor fields (see chapter 21). 

Just as the divergence theorem (11.18) can be used to relate volume and surface 
integrals for certain types of integrand, Stokes’ theorem can be used in evaluating 
surface integrals of the form j> s (V x a) ■ dS as line integrals or vice versa. 


► Giuen the vector field a = yi — xj + zk, verify Stokes’ theorem for the hemispherical 
surface x 2 + y 2 + z 2 = a 2 , z > 0. 


Let us first evaluate the surface integral 


l 


(V x a) • dS 


over the hemisphere. It is easily shown that V x a = — 2k, and the surface element is 
rfS = a 2 sin 6 dd dtp r in spherical polar coordinates. Therefore 


n nln /»7t/2 

/ (V x a) ■ dS = / dtp d6 (—2a 2 sin 0) r • k 
Js Jo Jo 


= — 2a 2 
= —2a 2 


d<f> 


d<f> 


r n/2 


sin 8 


aJ 


dd 


fji/2 


sin 9 cos 6 d6 = —2na 2 . 


We now evaluate the line integral around the perimeter curve C of the surface, which 
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is the circle x 2 + y 2 = cr in the xy-plane. This is given by 


a ■ dr = ® (y i — xj + z k) • (dxi + dy j + dz k) 
c Jc 


= j) (y dx — x dy). 


Using plane polar coordinates, on C we have x = a cos <j>, y = a sin (j) so that dx = 
— asin0 dcj), dy = a cos <f>d<f>, and the line integral becomes 


(y dx — x dy) = ■ 


(sin 2 (j) + cos 2 cj>)d(j) = ■ 


dcj> = —2n cr 


Since the surface and line integrals have the same value, we have verified Stokes’ theorem 
in this case. ◄ 


The two-dimensional version of Stokes' theorem also yields Green’s theorem in 
a plane. Consider the region R in the xy-plane shown in figure 11.11, in which a 
vector field a is defined. Since a = a x i + a y j, we have Vxa = (da y /dx — da x /8y) k, 
and Stokes’ theorem becomes 



Jy- ) dxdy = 


(a x dx + a y dy). 


Letting P = a x and Q — a y we recover Green’s theorem in a plane, (11.4). 


11.9.1 Related integral theorems 

As for the divergence theorem, there exist two other integral theorems that are 
closely related to Stokes’ theorem. If cj) is a scalar field and b is a vector field, 
and both cj) and b satisfy our usual differentiability conditions on some two-sided 
open surface S bounded by a closed perimeter curve C, then 


/ dS x V(/> = ® <j) dr, 
Js Jc 

j (dS x V) x b — j) dr x b. 


(11.24) 

(11.25) 


► Use Stokes’ theorem to prove ( 11.24). 

In Stokes' theorem, (11.23), let a = <f>c, where c is a constant vector. We then have 

f [V x {(/>c)] ■ dS = (f 0c • dr. (11.26) 

J s Jc 

Expanding out the integrand on the LHS we have 

V x {(j> c) = V<f> x c + cf>V x c = W(j> x c, 

since c is constant, and the triple scalar product on the LHS of (11.26) can therefore be 
written 

[V x (0c)] ■ dS = (V0 x c) ■ dS = c • (dS x V0). 
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Substituting this into (11.26) and taking c out of both integrals because it is constant, we 
find 

c- / dS x V0 = c ■ ® <f>dr. 

J s Jc 

Since c is an arbitrary constant vector we therefore obtain the stated result (11.24). ◄ 

Equation (11.25) may be proved in a similar way, by letting a = b x c in Stokes’ 
theorem, where c is again a constant vector. We also note that by setting b = r 
in (11.25) we find 

/ (dS x V) x r = ® dr x r. 

Js Jc 

Expanding out the integrand on the LEIS we find 

(dS x V) x r = dS - dS(V • r) = dS - 3 dS = -2 dS. 

Therefore, as we found in subsection 11.5.2, the vector area of an open surface S 
is given by 

S = / dS = ^ (/ r x hr. 

Js 2 Jc 

11.9.2 Physical applications of Stokes’ theorem 

Like the divergence theorem, Stokes’ theorem is useful in converting integral 
equations into differential equations. 


►from Ampere’s law derive Maxwell’s equation in the case where the currents are steady, 
i.e. V x B — /( 0 J = 0. 


Ampere’s rule for a distributed current with current density J is 


® B ■ dr = p 0 / J ■ dS, 
Jc J s 


for any circuit C bounding a surface S. Using Stokes' theorem, the LHS can be transformed 
into f s (V x B) • dS; hence 


l 


(V x B — /ro J ) ■ dS = 0 


for any surface S. This can only be so if V x B — /( 0 J = 0, which is the required relation. 
Similarly, from Faraday's law of electromagnetic induction we can derive Maxwell’s 
equation V x E = —dB/dt. ◄ 


In subsection 11.8.3 we discussed the flow of an incompressible fluid in the 
presence of several sources and sinks. Let us now consider vortex flow in an 
incompressible fluid with a velocity field 



in cylindrical polar coordinates p, <f), z. For this velocity field V x v equals zero 
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everywhere except on the axis p = 0, where v has a singularity. Therefore <f c v • dr 
equals zero for any path C that does not enclose the vortex line on the axis and 
2n if C does enclose the axis. In order for Stokes’ theorem to be valid for all 
paths C, we therefore set 

V x v = 2nS(p), 

where 8(p) is the Dirac delta function, to be discussed in subsection 13.1.3. Now, 
since V x v = 0, except on the axis p = 0, there exists a scalar potential i/’ such 
that v = Vi/’- It may easily be shown that ip = </>, the polar angle. Therefore, if C 
does not enclose the axis then 

v • dr = j) dcf> = 0, 

and if C does enclose the axis, 

j) \ ■ dr = A ip = 2nn, 

where n is the number of times we traverse C. Thus (/> is a multivalued potential. 

A similar analysis is valid for other physical systems - for example, in magneto- 
statics we may replace the vortex lines by current-carrying wires and the velocity 
field v by the magnetic field B. 


11.1 


11.2 


11.3 

11.4 


11.5 


11.10 Exercises 


The vector field F is defined by 

F = 2xzi + 2yz 2 j + (x 2 + 2y 2 z — l)k. 


Calculate V x F and deduce that F can be written F = V<f>. Determine the form 
of <j). 

The vector field Q is defined by 

Q = [3x 2 (y + z) + y 3 + z 3 ] i + [3y 2 (z + x) + z 3 + x 3 ] j + [3z 2 (x + y) + x 3 + y 3 ] k. 

Show that Q is a conservative field, construct its potential function and hence 
evaluate the integral J = f Q ■ dr along any line connecting the point A at 
(1,-1, 1) to B at (2,1,2). 

F is a vector field xy 2 i + 2j + xk, and L is a path parameterised by x = ct, y = c/t , 
z = d for the range 1 < t < 2. Evaluate (a) f L F dt , (b) f L F dy and (c) f L F • dr. 
By making an appropriate choice for the functions P(x,y) and Q(x,y) that appear 
in Green’s theorem in a plane, show that the integral of x — y over the upper half 
of the unit circle centred on the origin has the value — = . Show the same result 
by direct integration in Cartesian coordinates. 

Determine the point of intersection P, in the first quadrant, of the two ellipses 


x 2 y 2 , J x 2 y 2 , 

~2 + Tl — 1 an d 7T 3 2 

a- b 2 b- a 1 


Taking b < a, consider the contour L that bounds that area in the first quadrant 
which is common to the two ellipses. Show that the parts of L that lie along the 
coordinate axes contribute nothing to the line integral around L of x dy — y dx, 
and that this line integral can be written as the sum of two such integrals, / 1 
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and 1 2 , around closed contours. Using a parameterisation of each ellipse similar 
to that employed in the example in section 11.3, evaluate these two integrals and 
hence find the total area common to the two ellipses. 

11.6 By using parameterisations of the form x = a cos" 6 and y = a sin" 6 for suitable 
values of n , find the area bounded by the curves 

x 2 ' 5 + y 2 ' 5 = a 2 ' 5 and x 2/3 + y 2/i = a 2 ' 2 


11.7 Evaluate the line integral 


11.8 


11.9 


/ = j) [y(4x 2 + y 2 )dx + x(2x 2 + 3 y 2 ) dy\ 


around the ellipse x 2 /a 2 + y 2 /b 2 = 1. 
Criticise the following ‘proof’ that n = 0. 


(a) Apply Green’s theorem in a plane to the functions P(x,y ) = tan^ly/x) and 
Q(x,y) = tan ~ l (x/y), taking the region R to be the unit circle centred on the 
origin. 

(b) The RHS of the equality so produced is 


n 


y — x 
r x 2 +y 2 


dx dy 


which, either by symmetry considerations or by changing to plane polar 
coordinates, can be shown to have zero value. 

(c) In the LHS of the equality set x = cos 6 and y = sind, yielding P(6) = 8 
and Q(9) = n/2 — 9. The line integral becomes 


f 2 r 

/ 71 „\ 

/ 71 

(--9) 

JO I- 

A2 ) 


cos 9 — 6 sin 0 


d9. 


which has value 2n. 

(d) Thus 27i = 0 and the stated result follows. 


A single-turn coil C of arbitrary shape is placed in a magnetic field B and carries 
a current /. Show that the couple acting upon the coil can be written as 


M = 1 f (Br)dr-I j B(r-rfr). 

Jc Jc 

For a planar rectangular coil of sides 2a and 2b placed with its plane vertical 
and at an angle 0 to a uniform horizontal field B, show that M is, as expected, 
AabBI cos k. 

11.10 Find the vector area S of the curved surface of the hyperboloid of revolution 

x 2 r + g2 | 

a 2 b 2 

which lies in the region z > 0 and a < x < Xa. 

11.11 An axially symmetric solid body with its axis AB vertical is immersed in an 
incompressible fluid of density po ■ Use the following method to show that, 
whatever the shape of the body, for p = p(z) in cylindrical polars the Archimedean 
upthrust is, as expected, pogV, where V is the volume of the body. 

Express the vertical component of the resultant force (— f p dS, where p is the 
pressure) on the body in terms of an integral; note that p = —pogz and that for 
an annular surface element of width dl , n ■ n_- dl = —dr. Integrate by parts and 
use the fact that p(za) = p(-b) = 0. 
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11.12 Show that the expression below is equal to the solid angle subtended by a 
rectangular aperture of sides 2 a and 2b at a point a distance c from the aperture 
along the normal to its centre : 


£2 = 4 


0 ( y 2 + C 2 )(y 2 + C 2 + «2)l/2 


dy. 


By setting y = (u 2 + c 2 ) l/2 tan ((>, change this integral into the form 



4 ac cos <j) 
c 2 + a 2 sin 2 <fi 


d(j>. 


where tan <f>i = b/(a 2 + c 2 ) 1/2 , and hence show that 


£2 = 4 tan 1 


ab 

c(a 2 + b 2 + c 2 )!/2 


11.13 A vector field a is given by — zxr _3 i— zyr _3 j4-(x 2 +y 2 )r~ 3 k, where r 2 = x 2 +y 2 +z 2 . 
Establish that the field is conservative (a) by showing V x a = 0 and (b) by 
constructing its potential function <j>. 

11.14 A vector field a is given by (z 2 + 2xy)i + (x 2 + 2yz)j + (y 2 + 2zx)k. Show that 
a is conservative and that the line integral J a ■ dr along any line joining (1, 1, 1) 
and (1,2,2) has the value 11. 

11.15 A force F(r) acts on a particle at r. In which of the following cases can F be 
represented in terms of a potential? Where it can, find the potential. 

(a) F = F 0 [i-j-^r]exp(-5); 

(b) F =a[ zk+ hW) r ] exp (-g); 

(c) F = F 0 [k + . 

11.16 One of Maxwell’s electromagnetic equations states that all magnetic fields B 
are solenoidal (i.e. V ■ B = 0). Determine whether each of the following vectors 
could represent a real magnetic field ; where it could, try to find a suitable vector 
potential A, i.e. such that B = V x A. (Hint : seek a vector potential that is parallel 
to V x B.) : 


B b 

(a) — — [(x — y)z i + (x — y)z j + x 2 — y 2 k] in Cartesians with r 2 = x 2 + y 2 + z 2 ; 

r 3 

Bob 

(b) — — [cos 0 cos (be r — sin 6 cos (be 0 + sin 26 sin in spherical polars; 

r 3 


(c) Bob 2 


+ 


1 


(b 2 +z 2 ) 2 p b 2 +z 2 


in cylindrical polars. 


11.17 The vector field f has components yi— xj+k and y is a curve given parametrically 
by 

r = (a — c + ccos f))i + (b + csin 0)j + c 2 f)k, 0 < 6 < 2n. 


Describe the shape of the path y and show that the line integral f ; f • dr vanishes. 
Does this result imply that f is a conservative field? 

11.18 A vector field a = /(r)r is spherically symmetric and everywhere directed away 
from the origin. Show that a is irrotational but that it is also solenoidal only if 
f(r) is of the form Ar~ 3 . 
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11.19 


11.20 


11.21 


Evaluate the surface integral f r ■ dS, where r is the position vector, over that 
part of the surface z = a 2 — x 2 — y 2 for which z > 0, by each of the following 
methods: 


(a) parameterize the surface as x = a sin 9 cos <f>, y = a sin 9 sin (f>, z = a 2 cos 2 9, 
and show that 

r ■ dS = n 4 ( 2 sin 3 9 cos 9 + cos 3 9 sin 9)d9 dcf>. 

(b) apply the divergence theorem to the volume bounded by the surface and the 
plane z = 0. 

Obtain an expression for the value </) P at a point P of a scalar function <j> that 
satisfies V 2 4> = 0 in terms of its value and normal derivative on a surface S that 
encloses it, by proceeding as follows. 


(a) In Green’s second theorem take i/s at any particular point Q as 1/r, where r 
is the distance of Q from P. Show that V 2 i/s = 0 except at r = 0. 

(b) Apply the result to the doubly connected region bounded by S and a small 
sphere E of radius 5 centred on P. 

(c) Apply the divergence theorem to show that the surface integral over Z 
involving l/<5 vanishes, and prove that the term involving l/d 2 has the value 
4n(f>p. 

(d) Conclude that 


= ~ I 4>^r\ -) dS+^~ f - C ^dS. 


1 

471 


dn 


An 


/• dn 


This important result shows that the value at a point P of a function <j> 
that satisfies V 2 4> = 0 everywhere within a closed surface S that encloses P 
may be expressed entirely in terms of its value and normal derivative on S. 
This matter is taken up more generally in connection with Green's functions 
in chapter 19 and in connection with functions of a complex variable in 
section 20.12. 


Use result (11.21), together with an appropriately chosen scalar function 4> to 
prove that the position vector r of the centre of mass of an arbitrarily-shaped 
body of volume V and uniform density can be written 


r = 



\r 2 dS 


11.22 


11.23 


A rigid body of volume V and surface S rotates with angular velocity to. Show 
that 

co = — u x dS, 

where u(x) is the velocity of the point x on the surface S. 

Demonstrate the validity of the divergence theorem : 


(a) by calculating the flux of the vector 


F = 


ar 

( r 2 + a 2 ) 3 / 2 


(b) 


through the spherical surface |r| = ^/3, a; 
by showing that 

„ „ 3aa 2 

V • F = 

(r 2 T a 2 ) 3 / 2 
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11.24 


11.25 


and evaluating the volume integral of V ■ F over the interior of the sphere 
|r| = 73 a. 

(The substitution r = a tanf) will prove useful in carrying out the integration.) 
Prove equation ( 1 1.22) and, by taking b = zx 2 i + zy 2 j + (x 2 — y 2 )k, show that the 
two integrals 

1 = J x 2 dV and J = j cos 2 9 sin 3 6 cos 2<j> dd dcj), 

both taken over the unit sphere, must have the same value. Evaluate both directly 
to show that the common value is 47i/15. 

In a uniform, non-dielectric, conducting medium with unit relative permittivity, 
charge density p, current density J, electric field E and magnetic field B, Maxwell’s 
electromagnetic equations take the form (with 0 = c~ 2 ) 

(i) V ■ B = 0, (ii) V ■ E = p/e 0 , 

(iii) V x E + B = 0, (iv) V x B - (E /c 2 ) = p 0 J, 


11.26 


11.27 


11.28 


The density of stored energy in the medium is given by ^(eoE 2 + p 0 l B 2 ). Show 
that the rate of change of the total stored energy in a volume V is equal to 

-[ J ■ EdV — — 1 (E x B) ■ dS, 

Jv Bo Js 

where S is the surface bounding V. (The first integral gives the ohmic heating 
loss, whilst the second gives the electromagnetic energy flux out of the bounding 
surface. The vector pg *(E x B) is known as the Poynting vector.) 

A vector field F is defined in cylindrical polar coordinates p, 6, z by 

x cos Az . y cos Az . . \ p , . 

1 + : j + (sm2z)k = -(cos2z)e„ + (sm2z)k, 

a a J a 

where i, j and k are the unit vectors along the Cartesian axes and e p is the unit 
vector (x/ p)i + (y/p)j. 

(a) Calculate, as a surface integral, the flux of F through the closed surface 
bounded by the cylinders p = a and p = 2a and the planes z = +an/2. 

(b) Evaluate the same integral using the divergence theorem. 

The vector field F is given by 

F = (3 x 2 yz + y 3 z + xe~ x )i + (3xy 2 z + x 3 z + ye x ) j + (x 3 y + y 3 x + xy 2 z 2 )k. 


F = F 0 


Calculate (a) directly and (b) by using Stokes’ theorem the value of the line 
integral f L F ■ dr, where L is the (three-dimensional) closed contour OABCDEO 
defined by the successive vertices (0,0,0), (1,0,0), ( 1,0, 1), (1, 1, 1), (1, 1,0), (0, 1,0), 
(0,0,0). 

A vector force field F is defined in Cartesian coordinates by 


F = F 0 


J!_ _j_ lety/a* 
3 a 3 a 



i + 



x + y,*v/A 

a J 


j + 



Use Stokes’ theorem to calculate 


E-dr, 


where L is the perimeter of the rectangle ABCD given by A = (0, 1,0), B = (1, 1,0), 
C = (1,3,0) and D = (0,3,0). 
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11.11 Hints and answers 

11.1 Show that V x F = 0. The potential (f> F ( r) = x 2 z + y 2 z 2 — z. 

11.2 Show that one component of V x Q is zero and apply symmetry. The potential 
4>g(r) = xy(x 2 +y 2 ) + yz(y 2 + z 2 ) + zx(z 2 + x 2 ); J = ^q(B) — <I)q(A) = 54. 

11.3 (a) c 3 ln2i + 2 j + (3c/2)k; (b) ( — 3c 4 /8)i — cj — (c 2 ln2)k; (c) c 4 In 2 — c. 

11.4 Take P = y 2 and Q = x 2 . Show that the line integral along the x-axis from 

(—1,0) to (1,0) contributes nothing. 

11.5 For P, x = y = ab/(a 2 + b 2 ) 1/2 . Note that the integral along the straight line 
joining P to the origin is traversed in opposite directions in I t and I 2 . The 
relevant limits are 0 < 6i < tan -1 (b/a) and tarr 1 {a/b) < 9 2 < n/2. As required 
by symmetry, I\ = I 2 ; the total common area is 4afotan~ 1 (b/a). 

11.6 Use the result of the worked example in section 11.3 and the reduction formulae 
derived in exercise 2.42. Bounded area = 33rcn 2 /128. 

11.7 Show that, in the notation of section 11.3, dQ/dx — dP/dy = 2x 2 ; I = na 3 b/2. 

11.8 The conditions for Green’s theorem are not met as P and Q are not continuous 
(or differentiable) at the origin. 

11.9 M = / f c r x (dr x B). 

11.10 Since the vector area of a closed surface vanishes, S = — Sji + Sik where Si is 
the area of the semicircular intersection with the plane x = Aa and S 2 is the 
area of the hyperbolic intersection with the plane z = 0; Si = ^nb 2 (A 2 — 1); 
S 2 = cib[AJ{A 2 — 1) — cosh -1 A], 

11.13 (b) (f> = c + z/r. 

11.14 The appropriate potential function is f(x,y,z) = z 2 x + x 2 y + y 2 z. 

11.15 (a) Yes, F 0 (x — y)exp(—r 2 /a 2 ); (b) yes, — F 0 [(x 2 + y 2 )/2a] exp(— r 2 /a 2 ); 

(c) no, V x F ^ 0. 

11.16 Only (c) has zero divergence. A possible vector potential is \B 0 b 2 p(b 2 + z 2 ) -1 ^; 
to this could be added the gradient of any scalar function. 

11.17 A spiral of radius c with its axis parallel to the z-direction and passing through 
(a, b). The pitch of the spiral is Inc 2 . No, because (i) y is not a closed loop and 
(ii) the line integral must be zero for every closed loop, not just for a particular 
one. In fact V x f = —2k ^ 0 shows that f is not conservative. 

11.18 Vxa = 0;Va = 3/(r) + rf'(r) = 0 if f(r) = Ar -3 . 

11.19 (a) dS = {2a 3 cost) sin 2 9cos(f)i + 2 a 3 cost) sin 2 9 sin 0 j + a 2 cos 9 sind k)d9 d<j). 
(b) V ■ r = 3; over the plane z = 0, r ■ dS = 0; The necessarily common value is 

3na 4 /2. 

11.20 (d) Remember that the outward normal to the region is the inward normal to Z. 

11.21 Write r as V( |r 2 ). 

11.22 Use result (11.22) and the expression for V x (a x b) and note that (to ■ V)x = to. 

11.23 T heansweris3^f3na/2 in each case. 

11.24 Follow the method indicated in subsection 11.8.2, using an identity given in table 
10.1. Use Cartesian coordinates for the LHS of equation (11.22) and spherical 
polars for the RHS. Employ (anti)symmetry and periodicity arguments to set 
several integrals to zero without explicit calculation. 

11.25 Identify the expression for V • (E x B) and use the divergence theorem. 

11.26 6nFo(a 2 +2a/A)sin(Aan/2). 

11.27 (a) The successive contributions to the integral are 1,0, 2 + \e, — 1,— 

(b) V x F = 2xyz 2 i — y 2 z 2 j + ye x k. Show that the contour is equivalent to the 
sum of two plane square contours in the planes z = 0 and x = 1, the latter being 
traversed in the negative sense. Integral = |(3e — 5). 

11.28 Jq dx f* a dy F 0 (y/a) 2 e xy/ “ 2 = F 0 a(2e 3 — 4). 
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Fourier series 


We have already discussed, in chapter 4, how complicated functions may be 
expressed as power series. However, this is not the only way in which a function 
may be represented as a series, and the subject of this chapter is the expression 
of functions as a sum of sine and cosine terms. Such a representation is called a 
Fourier series. Unlike Taylor series, a Fourier series can describe functions that are 
not everywhere continuous and/or differentiable. There are also other advantages 
in using trigonometrical terms. They are easy to differentiate and integrate, their 
moduli are easily taken and each term contains only one characteristic frequency. 
This last point is important because, as we shall see later, Fourier series are often 
used to represent the response of a system to a periodic input, and this response 
often depends directly on the frequency content of the input. Fourier series are 
used in a wide variety of such physical situations, including the vibrations of a 
finite string, the scattering of light by a diffraction grating and the transmission 
of an input signal by an electronic circuit. 


12.1 The Dirichlet conditions 

We have already mentioned that Fourier series may be used to represent some 
functions for which a Taylor series expansion is not possible. The particular 
conditions that a function f(x ) must fulfil in order that it may be expanded as a 
Fourier series are known as the Dirichlet conditions, and may be summarised by 
the following four points: 

(i) the function must be periodic; 

(ii) it must be single-valued and continuous, except possibly at a finite number 
of finite discontinuities; 

(iii) it must have only a finite number of maxima and minima within one 
period ; 

(iv) the integral over one period of |/(x)| must converge. 
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Figure 12.1 An example of a function that may be represented as a Fourier 
series without modification. 

If the above conditions are satisfied then the Fourier series converges to /(x) 
at all points where f(x) is continuous. The convergence of the Fourier series 
at points of discontinuity is discussed in section 12.4. The last three Dirichlet 
conditions are almost always met in real applications, but not all functions are 
periodic and hence do not fulfil the first condition. It may be possible, however, 
to represent a non-periodic function as a Fourier series by manipulation of the 
function into a periodic form. This is discussed in section 12.5. An example of 
a function that may, without modification, be represented as a Fourier series is 
shown in figure 12.1. 

We have stated without proof that any function that satisfies the Dirichlet 
conditions may be represented as a Fourier series. Let us now show why this is 
a plausible statement. We require that any reasonable function (one that satisfies 
the Dirichlet conditions) can be expressed as a linear sum of sine and cosine 
terms. We first note that we cannot use just a sum of sine terms since sine, being 
an odd function (i.e. a function for which /(— x) = —f(x)), cannot represent even 
functions (i.e. functions for which /(— x) = f(x)). This is obvious when we try 
to express a function f(x) that takes a non-zero value at x = 0. Clearly, since 
sin nx — 0 for all values of n, we cannot represent / (x) at x = 0 by a sine series. 
Similarly odd functions cannot be represented by a cosine series since cosine is 
an even function. Nevertheless, it is possible to represent all odd functions by a 
sine series and all even functions by a cosine series. Now, since all functions may 
be written as the sum of an odd and an even part, 

/ (x) = 3 [/(x) + /(— x)] + 3 [f(x) - /(— x)] 

= /even(^) “1“ /odd(^)? 
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we can write any function as the sum of a sine series and a cosine series. 

All the terms of a Fourier series are mutually orthogonal, that is, the integrals, 
over one period, of the product of any two terms have the following properties: 






cos 


cos 


sin 



dx = 0 for all r and p, 

(12.1) 

l L 

for r = p — 0, 


dx = < Ifi 

for r = p > 0, 

(12.2) 

lo 

for r f p, 


f° 

for r = p = 0, 


dx = < \L 

for r = p > 0, 

(12.3) 

lo 

for r f p, 



where r and p are integers greater than or equal to zero ; these formulae are easily 
derived. A full discussion of why it is possible to expand a function as a sum of 
mutually orthogonal functions is given in chapter 17. 

The Fourier series expansion of the function f(x ) is conventionally written 


/(*) = ?+£ 


a T cos 


/ 2nrx\ 


+ b r sin 



(12.4) 


where ao,a r ,b r are constants called the Fourier coefficients. These coefficients are 
analogous to those in a power series expansion and the determination of their 
numerical values is the essential step in writing a function as a Fourier series. 

This chapter continues with a discussion of how to find the Fourier coefficients 
for particular functions. We then discuss simplifications to the general Fourier 
series that may save considerable effort in calculations. This is followed by the 
alternative representation of a function as a complex Fourier series, and we 
conclude with a discussion of Parseval’s theorem. 


12.2 The Fourier coefficients 

We have indicated that a series that satisfies the Dirichlet conditions may be 
written in the form (12.4). We now consider how to find the Fourier coefficients 
for any particular function. For a periodic function f(x) of period L we will find 
that the Fourier coefficients are given by 

2 f x « +L ( 2nrx \ 

a r = — J /(x)cos — J dx, (12.5) 

h r = - J f(x ) sin {^— j dx, (12.6) 

where xo is arbitrary but is often taken as 0 or — L/2. The apparently arbitrary 
factor \ which appears in the a 0 term in (12.4) is included so that (12.5) may 
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apply for r = 0 as well as r > 0. The relations (12.5) and (12.6) may be derived 
as follows. 

Suppose the Fourier series expansion of f(x) can be written as in (12.4), 


/<*) = ? + £ 


a r cos 


/ 2nrx\ 

V w 


+ b r sin 



Then, multiplying by cos(2npx/L), integrating over one full period in x and 
changing the order of the summation and integration, we get 



(12.7) 


We can now find the Fourier coefficients by considering (12.7) as p takes different 
values. Using the orthogonality conditions (12.1) (12.3) of the previous section, 
we find that when p = 0 (12.7) becomes 


rxo+L 


f (x)dx 



When p ^ 0 the only non-vanishing term on the RFIS of (12.7) occurs when 
r = p, and so 

fix) cos dx = yL. 



The other Fourier coefficients b r may be found by repeating the above process 
but multiplying by sin{2npx/L) instead of cos(27t px/L) (see exercise 12.2). 


► Express the square-wave function illustrated in figure 12.2 as a Fourier series. 


Physically this might represent the input to a electrical circuit that switches between a high 
and a low state with time period T. The square wave may be represented by 


m 


— 1 for — IT < t < 0, 
+1 for 0 < t < \ T. 


In deriving the Fourier coefficients, we note firstly that the function is an odd function 
and so the series will contain only sine terms (this simplification is discussed further in the 
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m 



Figure 12.2 A square-wave function. 


following section). To evaluate the coefficients in the sine series we use (12.6). Hence 


r-T/2 


br = 


T 

4 

T 


-T/2 
nr/ 2 


/(f) sin 


in 


V T 


dt 




dt 


= — [1 - (-!)'] ■ 
nr 

Thus the sine coefficients are zero if r is even and equal to 4 /(nr) if r is odd. Hence the 
Fourier series for the square-wave function may be written as 


/(f) = — ( sincof + 


sin3cof sin5eof 

1 


+ • 


( 12 . 8 ) 


where a> = 2n/T is called the angular frequency. ◄ 


12.3 Symmetry considerations 

The example in the previous section employed the useful property that since the 
function to be represented was odd, all the cosine terms of the Fourier series were 
zero. It is often the case that the function we wish to express as a Fourier series 
has a particular symmetry, which we can exploit to reduce the calculational labour 
of evaluating Fourier coefficients. Functions that are symmetric or antisymmetric 
about the origin (i.e. even and odd functions respectively) admit particularly 
useful simplifications. Functions that are odd in x have no cosine terms (see 
section 12.1) and all the a-coefficients are equal to zero. Similarly, functions that 
are even in x have no sine terms and all the /-coefficients are zero. Since the 
Fourier series of odd or even functions contain only half the coefficients required 
for a general periodic function, there is a considerable reduction in the algebra 
needed to find a Fourier series. 

The consequences of symmetry or antisymmetry of the function about the 
quarter period (i.e. about L/4) are a little less obvious. Furthermore, the results 
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are not used as often as those above and the remainder of this section can be 
omitted on a first reading without loss of continuity. The following argument 
gives the required results. 

Suppose that f(x ) has even or odd symmetry about L/4, i.e. /(L/4 — x) = 
+f(x — L/4). For convenience, we make the substitution s = x — L/4 and hence 
/(— s) = +/(s). We can now see that 


b,. 


2 

L 


nxo +L 


f(s) sin 




ds. 


where the limits of integration have been left unaltered since / is, of course, 
periodic in s as well as in x. If we use the expansion 



we can immediately see that the trigonometrical part of the integrand is an odd 
function of s if r is even and an even function of s if r is odd. Hence if f(s) is 
even and r is even then the integral is zero, and if f(s) is odd and r is odd then 
the integral is zero. Similar results can be derived for the Fourier a-coefficients 
and we conclude that 


(i) if /(x) is even about L/4 then a 2r+ i = 0 and b 2r = 0, 

(ii) if /(x) is odd about L/4 then a 2r = 0 and b 2r +i = 0. 

All the above results follow automatically when the Fourier coefficients are 
evaluated in any particular case, but prior knowledge of them will often enable 
some coefficients to be set equal to zero on inspection and so substantially reduce 
the computational labour. As an example, the square-wave function shown in 
figure 12.2 is (i) an odd function of t, so that all a r = 0, and (ii) even about the 
point t = T/4, so that b 2r = 0. Thus we can say immediately that only sine terms 
of odd harmonics will be present and therefore will need to be calculated; this is 
confirmed in the expansion (12.8). 


12.4 Discontinuous functions 

The Fourier series expansion usually works well for functions that are discon- 
tinuous in the required range. However, the series itself does not produce a 
discontinuous function and we state without proof that the value of the ex- 
panded f(x) at a discontinuity will be half-way between the upper and lower 
values. Expressing this more mathematically, at a point of finite discontinuity, xj, 
the Fourier series converges to 

2 lim [f(x d + e) + f(x d - <?)]. 

z e ->0 

At a discontinuity, the Fourier series representation of the function will overshoot 
its value. Although as more terms are included the overshoot moves in position 
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Figure 12.3 The convergence of a Fourier series expansion of a square-wave 
function, including (a) one term, ( b ) two terms, (c) three terms and (d ) 20 
terms. The overshoot 3 is shown in ( d ). 

arbitrarily close to the discontinuity, it never disappears even in the limit of an 
infinite number of terms. This behaviour is known as Gibbs’ phenomenon. A full 
discussion is not pursued here but suffice it to say that the size of the overshoot 
is proportional to the magnitude of the discontinuity. 


► Fine/ the value to which the Fourier series of the square-wave function discussed in sec- 
tion 12.2 converges at t = 0. 


It can be seen that the function is discontinuous at t = 0 and, by the above rule, we expect 
the series to converge to a value half-way between the upper and lower values, in other 
words to converge to zero in this case. Considering the Fourier series of this function, 
(12.8), we see that all the terms are zero and hence the Fourier series converges to zero as 
expected. The Gibbs phenomenon for the square-wave function is shown in figure 12.3. ◄ 
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(d) '-2x^1 

0 L 2L 

Figure 12.4 Possible periodic extensions of a function. 



12.5 Non-periodic functions 

We have already mentioned that a Fourier representation may sometimes be used 
for non-periodic functions. If we wish to find the Fourier series of a non-periodic 
function only within a fixed range then we may continue this function outside the 
range so as to make it periodic. The Fourier series of this periodic function would 
then correctly represent the non-periodic function in the desired range. Since we 
are often at liberty to extend the function in a number of ways, we can sometimes 
make it odd or even and so reduce the calculation required. Figure 12.4(h) shows 
the simplest extension to the function shown in figure 12.4(a). However, this 
extension has no particular symmetry. Figures 12.4(c), ( d ) show extensions as odd 
and even functions respectively with the benefit that only sine or cosine terms 
appear in the resulting Fourier series. We note that these last two extensions give 
a function of period 2 L. 

In view of the result of section 12.4, it must be added that the continuation 
must not be discontinuous at the end-points of the interval of interest; if it is 
the series will not converge to the required value there. This requirement that 
the series converges appropriately may reduce the choice of continuations. This 
is discussed further at the end of the following example. 


►Find the Fourier series of f(x) = x 2 for 0 < x < 2. 


We must first make the function periodic. We do this by extending the range of interest to 
— 2 < x < 2 in such a way that f(x) = /(— x) and then letting f(x + 4k) = f(x), where k is 
any integer. This is shown in figure 12.5. Now we have an even function of period 4. The 
Fourier series will faithfully represent f(x) in the range, — 2 < x < 2, although not outside 
it. Firstly we note that since we have made the specified function even in x by extending 
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f(x) = x 2 



Figure 12.5 f(x ) = x 2 , 0 < x < 2, with the range extended to give periodicity. 


the range, all the coefficients b r will be zero. Now we apply (12.5) and (12.6) with L = 4 
to determine the remaining coefficients: 


2 f 2 , [ 2nrx\ , 4 [ 2 / nrx\ 

= - x cos — - — ] ax = - x cos — — 


V 4 


- 


V 2 J 


dx. 


where the second equality holds because the function is even in x. Thus 
a,. = 


r 2 2 • / 

<nrx\ 

2 

4 f 2 

( nrx\ , 

— x sin 

-r-J 

— 

— / x sin 

— dx 

nr 

v 2 ) 

0 

nr Jo 

v 2 / 


8 r , 

V PfW 1 

( nrx\ 


8 r i 

( nrx\ 

n z r z L 

vT ) 

Jo 

* 2 r 2 Jo C ° S ' 

vT ) 


16 


= 4^(-D r - 

n z r z 

Since this expression for a r has r 2 in its denominator, to evaluate a 0 we must return to the 
original definition, 

(■2 


2 

^=4 


/ ( nrx\ 
J(x)co S ^ — ) dx. 


From this we obtain 


fl0 = 4 


x 2 dx = - 
-2 4, 


XT dx = 


The final expression for / (x) is then 


k 2 = \ + 16^3 ^-r^- cos ( 

i Z J 


(^) 


for 0 < x < 2. ◄ 


We note that in the above example we could have extended the range so as 
to make the function odd. In other words we could have set f(x) = — /(— x) and 
then made f(x) periodic in such a way that fix + 4) = f(x). In this case the 
resulting Fourier series would be a series of just sine terms. However, although 
this will faithfully represent the function inside the required range, it does not 
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converge to the correct values of /(x) = +4 at x = +2; it converges, instead, to 
zero, the average of the values at the two ends of the range. 


12.6 Integration and differentiation 

It is sometimes possible to find the Fourier series of a function by integration or 
differentiation of another Fourier series. If the Fourier series of f(x) is integrated 
term by term then the resulting Fourier series converges to the integral of f(x). 
Clearly, when integrating in such a way there is a constant of integration that must 
be found. If f(x) is a continuous function of x for all x and /(x) is also periodic 
then the Fourier series that results from differentiating term by term converges to 
/'(.x), provided that /'(x) itself satisfies the Dirichlet conditions. These properties 
of Fourier series may be useful in calculating complicated Fourier series, since 
simple Fourier series may easily be evaluated (or found from standard tables) 
and often the more complicated series can then be built up by integration and/or 
differentiation. 


►Find the Fourier series of f(x) = x 3 for 0 < x < 2. 


In the example discussed in the previous section we found the Fourier series for f(x) = x 2 
in the required range. So, if we integrate this term by term, we obtain 


x J 

T 




(-if 


m 


+ C, 


where c is, so far, an arbitrary constant. We have not yet found the Fourier series for x 3 
because the term |x appears in the expansion. However, by now differentiating the same 
initial expression for x 2 we obtain 


2 * = — 8 E 

r= 1 


(-iff 

nr 


sin 



We can now write the full Fourier expansion of x 3 as 




(-If 


Finally, we can find the constant, c, by considering / (0). At x = 0, our Fourier expansion 
gives x 3 = c since all the sine terms are zero, and hence c = 0. ◄ 


12.7 Complex Fourier series 

As a Fourier series expansion in general contains both sine and cosine parts, it 
may be written more compactly using a complex exponential expansion. This 
simplification makes use of the property that exp(irx) = cos rx + i sin rx. The 
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complex Fourier series expansion is written 


fix) = E O- exp ( 2n " X ) 

r= — m x / 


where the Fourier coefficients are given by 


l rxo+L 


C ' ~ L 


fix) ex p - 


2nirx\ 

w 


dx. 


(12.9) 


( 12 . 10 ) 


This relation can be derived, in a similar manner to that of section 12.2, by mul- 
tiplying (12.9) by exp(— 2nipx/L) before integrating and using the orthogonality 
relation 


nXQ +L 


exp 


2nipx\ ( 2nirx\ 
— ) 


dx 


L for r = p, 
0 for r ^ p. 


The complex Fourier coefficients in (12.9) have the following relations to the real 
Fourier coefficients : 

c r = i (a r — ib r ), 

2 (12.11) 

c_ r = \(a r + ib r ). 

Note that if f(x) is real then c_ r = c’, where the asterisk represents complex 
conjugation. 


>-Find a complex Fourier series for f(x) = x in the range —2 < x <2. 


Using (12.10), for r 0, 


x exp 


nirx \ 

'~2T 


x ( nirx\ 

-2 ^ exp r— ) 


dx 
-| 2 

- -2 


1 ( nirx\ 

exp I I 


2 2nir 

1 


= [exp(— nir) + exp(nir)] + 

nir 


2 i 2 i . 2 i 

= — cos nr z — r' sin nr = — (— 1) . 

— ” tty 


P 

r z n z 


nirx 

nirx\ 

~—J 


nr r A n 

For r = 0, we find cq = 0 and hence 


dx 

2 


(12.12) 


x = 


E 

r =— oo 

r* 0 


2/( — l) r 

exp 

rn 



We note that the Fourier series derived for x in section 12.6 gives a r = 0 for all r and 

, _ 4(— 1 Y 

fir ? 

nr 

and so, using (12.11), we confirm that c r and c_ r have the forms derived above. It is also 
apparent that the relationship c* = c_ r holds, as we expect, since f(x) is real. ◄ 
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12.8 Parseval’s theorem 


Parseval’s theorem gives a useful way of relating the Fourier coefficients to the 
function that they describe. Essentially a conservation law, it states that 

i rxo+L 00 

j / \f(x)\ 2 dx= l^l 2 

L Jx o r =- oo 

00 

= G«o ) 2 + + bj). (12.13) 

i=i 

In a more memorable form, this says that the sum of the moduli squared of 
the complex Fourier coefficients is equal to the average value of |/(x)| 2 over one 
period. Parseval’s theorem can be proved straightforwardly by writing f(x) as 
a Fourier series and evaluating the required integral, but the algebra is messy. 
Therefore, we shall use an alternative method, for which the algebra is simple 
and which in fact leads to a more general form of the theorem. 

Let us consider two functions f(x) and g(x), which are (or can be made) 
periodic with period L and which have Fourier series (expressed in complex 
form) 

A ( 2nirx\ 

fix ) = 2^ ex P ( — - ) > 

r =— oo ' ' 

( 2nirx\ 

gw = 2^ y* ex P ( ) > 

r =— oo ' ' 

where c r and y r are the complex Fourier coefficients of f(x) and g(.x) respectively. 
Thus 

00 

/(x)g*(x)= c rg*(x)exp 


f 2nirx\ 

J 


Integrating this equation with respect to x over the interval (xo, xo + L) and 
dividing by L, we find 


1 

L 


rXo+L 00 | rxo+L 

/ /(x)g*(x) dx = ^2 c r — / g*(x)exp 

J X 0 . ^ J Xo 


( 2nirx\ 


= E £ 

r =— oo 
00 

= c ^*’ 


rxo+L 


g(x) exp 


V L J 

2nirx\ 
~ ) 


dx 


dx 


where the last equality uses (12.10). Finally, if we let g(x) = f(x) then we obtain 
Parseval’s theorem (12.13). This proof can be performed in a similar manner 
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using the sine and cosine form of the Fourier series, but the algebra is slightly 
more complicated. 

Parseval’s theorem is sometimes used to sum series. However, if one is presented 
with a series to sum, it is not usually possible to decide which Fourier series 
should be used to evaluate it. Instead, useful summations are sometimes found 
serendipitously. The following example shows the evaluation of a sum by a 
Fourier series method. 


► Using ParsevaVs theorem and the Fourier series for f{x) = x 2 found in section 12.5, 
calculate the sum r~ 4 . 


Firstly we find the average value of [f(x)] 2 over the interval — 2 < x < 2: 



Now we evaluate the right-hand side of (12.13): 


16 2 


a a o) +\'Yj a l + \'Yj b l- 


Equating the two expression we find 



r = 1 


12.1 

12.2 

12.3 


12.4 


12.5 


12.6 


12.9 Exercises 


Prove the orthogonality relations stated in section 12.1. 

Derive the Fourier coefficients b r in a similar manner to the derivation of the a r 
in section 12.2. 

Which of the following functions of x could be represented by a Fourier series 
over the range indicated? 


(a) tanh _1 (x), 

(b) tanx, 

(c) | sinx| _1/2 , 

(d) cos -1 (sin 2x), 

(e) xsin(l/x) 


—00 < X < 00. 

—00 < X < 00. 

—00 < X < 00. 

—00 < X < 00. 

— 7t _1 < x < 7i _1 , cyclically repeated. 


By moving the origin of t to the centre of an interval in which /(f) = +1, i.e. 
by changing to a new independent variable t! = t— \T, express the square-wave 
function in the example in section 12.2 as a cosine series. Calculate the Fourier 
coefficients involved (a) directly and (b) by changing the variable in result (12.8). 
Find the Fourier series of the function / (x) = x in the range —n < x < n. Hence 
show that 


1 - 



n 
4 ‘ 


For the function 


f(x) = 1 — x, 0 < x < 1 , 


find (a) the Fourier sine series and (b) the Fourier cosine series. Which would 
be better for numerical evaluation? Relate your answer to the relevant periodic 
continuations. 
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12.7 For the continued functions used in exercise 12.6 and the derived corresponding 
series, consider (i) their derivatives and (ii) their integrals. Do they give meaningful 
equations? You will probably find it helpful to sketch all the functions involved. 

12.8 The function y(x) = x sinx for 0 < x < n is to be represented by a Fourier series 
of period 2n that is either even or odd. By sketching the function and considering 
its derivative, determine which series will have the more rapid convergence. Find 
the full expression for the better of these two series, showing that the convergence 
~ n~ 3 and that alternate terms are missing. 

12.9 Find the Fourier coefficients in the expansion of f(x) = expx over the range 
— 1 < x < 1. What value will the expansion have when x = 2? 

12.10 By integrating term by term the Fourier series found in the previous question 
and using the Fourier series for f(x) = x found in section 12.6, show that 
f exp xdx = expx + c. Why is it not possible to show that d(expx)/dx = expx 
by differentiating the Fourier series of f(x) = expx in a similar manner? 

12.11 Consider the function f(x) = exp(— x 2 ) in the range 0 < x < 1. Show how it 
should be continued to give as its Fourier series a series (the actual form is not 
wanted) (a) with only cosine terms, (b) with only sine terms, (c) with period 1 
and (d) with period 2. 

Would there be any difference between the values of the last two series at (i) 
x = 0, (ii) x = 1 ? 

12.12 Find, without calculation, which terms will be present in the Fourier series for 
the periodic functions f(t), of period T, that are given in the range —T/ 2 to T / 2 
by: 

(a) /(f) = 2 for 0 < |f| < T/ 4, / = 1 for T/4 < \t\ < T/2; 

(b) /(f) = exp[-(f- T/4) 2 ]; 

(c) /(f) = -1 for -T/2 < f < -3T/8 and 3T/8 < t < T/2, /(f) = 1 for 
— T/8<f<— T/8; the graph of / is completed by two straight lines in the 
remaining ranges so as to form a continuous function. 

12.13 Consider the representation as a Fourier series of the displacement of a string 
lying in the interval 0 < x < L and fixed at its ends, when it is pulled aside by y 0 
at the point x = L/4. Sketch the continuations for the region outside the interval 
that will 

(a) produce a series of period L, 

(b) produce a series that is antisymmetric about x = 0, and 

(c) produce a series that will contain only cosine terms. 

(d) What are (i) the periods of the series in (b) and (c) and (ii) the value of the 
‘ao-term' in (c)? 

(e) Show that a typical term of the series obtained in (b) is 

32yo . nn . nnx 
2n 2 n 2 4 L 

12.14 Show that the Fourier series for the function y(x) = |x| in the range — n < x < n 
is 

. . n 4 cos(2/u + l)x 
Ax)= (2m + l) 2 ~ 

By integrating this equation term by term from 0 to x, find the function g(x) 
whose Fourier series is 

4 ^ sin(2m + l)x 
n [2m + 1 ) 3 

m=0 
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12.15 


Deduce the value of the sum S of the series 

, 1 1 1 
1 - 3 * + 5 * - 7 * + '"- 

Using the result of exercise 12.14, determine, as far as possible by inspection, the 
form of the functions of which the following are the Fourier series: 


(a) 


(b) 


cos 0 + ^ cos 39 + ^ cos 50 + • • • 


sin 9 + sin 39 + sin 50 + • • • 


(c) 


L 2 

T 


4L 2 

77-2 


7i x 1 2nx 1 37ix 
co S ___ co S _ + _ c ° s _ 


(You may find it helpful to first set x = 0 in the quoted result and so obtain 
values for So = ^2(2m + lp 2 and other sums derivable from it.) 

12.16 By finding a cosine Fourier series of period 2 for the function /(f) that takes the 
form /(f) = cosh(f — 1) in the range 0 < t < 1, prove that 


E 


i 

n 2 n 2 + 1 


1 


<? 2 -r 


Deduce values for the sums J2(n 2 n 2 + lp 1 over odd n and even n separately. 

12.17 Find the (real) Fourier series of period 2 for /(x) = coshx and g(x) = x 2 in the 
range — 1 < x < 1. By integrating the series for /(x) twice, prove that 


” (_l)«+i _ i / 1 _ 5\ 

p-j' n 2 n 2 (n 2 n 2 + 1) 2 \sinh 1 6 ) 


12.18 Express the function /(x) = x 2 as a Fourier sine series in the range 0 < x < 2 
and show that it converges to zero at x = +2. 

12.19 Demonstrate explicitly for the square-wave function discussed in section 12.2 that 
Parsevafs theorem (12.13) is valid. You will need to use the relationship 


E 


i 


(2m +1) 2 8 


Show that a filter that transmits frequencies only up to 8 n/T will still transmit 
more than 90 per cent of the power in such a square-wave voltage signal. 

12.20 Show that the Fourier series for | sin0| in the range —n < 9 < n is given by 


| sin 0| 


2 4 ^ cos 2 mQ 
7i n 4m 2 — 1 

m= 1 


By setting 0 = 0 and 9 = n/2, deduce values for 


E 


i 

4m 2 — 1 


and 


E 

m = 1 


l 

16m 2 — 1 
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12.21 


12.22 


Find the complex Fourier series for the periodic function of period 2n defined in 
the range —n < x < n by y(x) = cosh x. By setting t = 0 prove that 


n 2 + 1 

n= 1 


1 

2 


sinhn 



The repeating output from an electronic oscillator takes the form of a sine wave 
/(f) = sinf for 0 < t < nil', it then drops instantaneously to zero and starts 
again. The output is to be represented by a complex Fourier series of the form 


12.23 


12.24 


12.25 


12.26 


£ c n e 4nti . 

n=— oo 

Sketch the function and find an expression for c„. Verify that c_„ = c*. Demon- 
strate that setting t = 0 and t = k/2 produces differing values for the sum 


E 


i 

16n 2 — 1 


Determine the correct value and check it using the quoted result of exercise 12.5. 
Apply ParsevaFs theorem to the series found in the previous exercise and so 
derive a value for the sum of the series 

17 65 145 16jz 2 + 1 

(T5) 2 + (63p + (143) 2 + + (16z? 2 — l) 2 + '" ' 

A string, anchored at x = +L/ 2, has a fundamental vibration frequency of 2 L/c, 
where c is the speed of transverse waves on the string. It is pulled aside at its 
centre point by a distance y 0 and released at time f = 0. Its subsequent motion 
can be described by the series 


00 

nnx nnct 
y(X, t) = a n COS cos . 

( 1=1 


Find a general expression for a„ and show that only odd harmonics of the 
fundamental frequency are present in the sound generated by the released string. 
By applying ParsevaFs theorem, find the sum S of the series J]^(2/u + 1)~ 4 . 
Show that ParsevaFs theorem for two functions whose Fourier expansions have 
cosine and sine coefficients a n , b n and a„, /?„ takes the form 


1 

L 



f(x) g’(x) dx 


1 1 x °° A 

ja 0 a 0 + x £(a n s((i + b n P„). 


(a) Demonstrate that for g(x) = sinmx or cos mx this reduces to the definition 
of the Fourier coefficients. 

(b) Explicitly verify the above result for the case in which f(x) = x and g(x) is 
the square-wave function, both in the interval —1 < x < 1. 


An odd function / (x) of period 2n is to be approximated by a Fourier sine series 
having only m terms. The error in this approximation is measured by the square 
deviation 


P = 


f(x) - £ bn sir 


dx. 


By differentiating E m with respect to the coefficients b„, find the values of b„ that 
minimise E m . 
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Sketch the graph of the function f(x), where 

, , . _ f —x(n + x) for — Ti < x < 0, 

- ■ ~ | x(x — n) for 0 < x < n. 

f (x) is to be approximated by the first three terms of a Fourier sine series. What 
coefficients minimise £ 3 ? What is the resulting value of £ 3 ? 


12.10 Hints and answers 

12.3 Only (c). In terms of the Dirichlet conditions (section 12.1), the others fail as 
follows: (a) (i); (b) (ii); (d) (ii); (e) (iii). 

12.4 a n = [(4/(rc7i)](— 1)(" -1 )/ 2 for n odd and a„ = 0 for n even. In (b) use the expansion 
of sin(/l + B). 

12.5 f(x) = 2J2f(— l)" +1 n _1 sin nx; set x = 7t/2. 

12.6 (a) 5H[(2/(;!7i)] sin nnx, all n; (b) 5^[(4/(jr7i: 2 )] cos nnx for odd n only. The cosine 
series, with ir 2 convergence and alternate terms missing; the sine continuation 
contains a discontinuity. 

12.7 (i) Series (a) from exercise 12.6 does not converge and cannot represent the 
function y(x) = ■— 1. Series (b) reproduces the square-wave function of equation 
(12.8). 

(ii) Series (a) gives the series for y(x) = —x — \x 2 — ^ in the range — T < x < 0 
and for y(x) = x — \x 2 — 3 in the range 0 < x < 1. Series (b) gives the series for 
y(x) = x + f x 2 + 3 in the range — 1 < x < 0 and for y(x) = x — \x 2 + \ in the 
range 0 < x < 1. 

12.8 The even continuation has a discontinuity in its derivative at x = n, whilst the 
odd continuation does not; thus the sine series will have better convergence, 
foi = n/2; bim+i = 0 for m > 0; bi,,, = —16m/[n(4m 2 — l) 2 ]. 

12.9 f(x) = (sinh 1){1 + 2 1)"(1 + n 2 7t 2 ) -1 [cos(n7tx) — nn sin(u7tx)] } ; 

/ (2) = f(0) = 1. 

12.10 Combine the coefficients of the sin(imx) terms from the Fourier series for x and 
(part of) f exp xdx; the partial series obtained by differentiating the sin(«7tx) 
terms does not converge, having coefficients of the form (/i7i) 2 /[l +(;j7t) 2 ]. 

12.11 See figure 12.6. (c) (i) (1 +e-‘)/2, (ii) (1 + e _1 )/2; (d) (i) ( 1 + <?- 4 )/2, (ii) e~ l . 



Figure 12.6 Continuations of exp(— x 2 ) in 0 < x < 1 to give: (a) cosine terms 
only; ( b ) sine terms only; (<?) period 1; ( d ) period 2. 


12.12 (a) a 0 and odd cosines; (b) all, there is no symmetry about T /4 for the periodic 
function; (c) Odd cosines. 

12.13 (d) (i) The periods are both 2 L; (ii) yo/2. 

12.14 g(x) = jx(n — x) for x > 0 and = ^x(n + x) for x < 0. Set x = n/2;S = n 2 /32. 
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12.15 S 0 = 7i 2 /8. If S e = 2 then S e = ^(S e + S a ), yielding S 0 — S e = n 2 /12 and 

(S e + So) = 7I 2 /6. 

(a) (n/4)(n/2— |0|); (b) (n9/4)(n/2—\0\/2) from integrating (a), (c) Even function; 
average value L 2 /3;y(0) = 0; y(L) = L 2 ; probably y(x) = x 2 . Compare with the 
worked example in section 12.5. 

12.16 cosh(f— 1) = (sinh l)[l+2^JE(cos nnt)/(n 2 n 2 + l)] ; set t = 0 to obtain the stated 
result; set t = 1 to evaluate Y2(—l) n /(n 2 n 2 + l) and add and subtract this quantity 
from E(« 2 ^ 2 + 1) _1 - Eodd = ( e ~ !)/[4(e + D] J Eeven = (3 - e)/[4(e - 1)]. 

12.17 coshx = (sinh 1)[1+2E^ =1 ( — l)"(cosn7tx)/(n 2 7i: 2 + l)] and after integrating twice 
this form must be recovered. Use x 2 = i+4^(— l)"(cosn7rx)/(n 2 7t 2 )] to eliminate 
the quadratic term arising from the constants of integration; there is no linear 
term. 

12.18 Consider f(x) = —x 2 for — 2 < x < 0, to ensure a sine series; 

E„ b n sin(nnx/2), with b„ = (— 1 )" +1 8/(n7t) for n even and (— l)" +1 8/(«7r) — 
32/(«7t) 3 for n odd. 

12.19 C+( 2 m+ i) = +2i/[(2m + 1 )tt] ; E |C„| 2 = [4/n 2 ) x 2 x (7i 2 /8); the values n = +1, 
+3 contribute > 90% of the total. 

12.20 Write sin f) cos n9 as \ [sin(n + 1 )Q — sin(n — 1)0]; obtain ^*(4 m 2 — 1) _1 = \ as 
well as E”(— l) m (4 ml — 1) _1 = \ — | and add the two equations; f, \ — |. 

12.21 c n = (— l)"[sinh n + in( cosh n — 1 )] / [tt( 1 + n 2 )]. 

12.22 c„ = (2/7i)[(4fii — l)/(16f? 2 — 1)]. The correct value is the mean of the two 
incorrect ones, i.e. (4 — 7i)/8. Write (16n 2 — 1R 1 in partial fractions and compare 
with exercise 12.5. 

12.23 (ti 2 -8)/16. 

12.24 a„ = 8yo/(n 2 n 2 ) for odd n,a„ = 0 otherwise; S = 7i 4 /96. 

12.25 (b) All a„ and a „ are zero; b„ = 2( — 1 )' ,+l /(nn) and /?„ = 4/(nn). You will need 
the result quoted in exercise 12.19. 

12.26 Show that the minimising value b k is given by 0 = f* K f(x) sin kx dx— E™=i b„Tt5kn 
and hence that b k is equal to the normal Fourier coefficient; b\ = —8/n, bi = 0, 
h = -8/(27ji); £3 = (64/7i)E”(2m + 1R 6 - 
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Integral transforms 


In the previous chapter we encountered the Fourier series representation of a 
periodic function in a fixed interval as a superposition of sinusoidal functions. It is 
often desirable, however, to obtain such a representation even for functions defined 
over an infinite interval and with no particular periodicity. Such a representation 
is called a Fourier transform and is one of a class of representations called integral 
transforms. 

We begin by considering Fourier transforms as a generalisation of Fourier 
series. We then go on to discuss the properties of the Fourier transform and its 
applications. In the second part of the chapter we present an analogous discussion 
of the closely related Laplace transform. 


13.1 Fourier transforms 

The Fourier transform provides a representation of functions defined over an 
infinite interval and having no particular periodicity, in terms of a superposition 
of sinusoidal functions. It may thus be considered as a generalisation of the 
Fourier series representation of periodic functions. Since Fourier transforms are 
often used to represent time-varying functions, we shall present much of our 
discussion in terms of /(f), rather than f(x), although in some spatial examples 
f(x) will be the more natural notation and we shall use it as appropriate. Our 
only requirement on /(f) will be that ff |/(f)| dt is finite. 

In order to develop the transition from Fourier series to Fourier transforms, we 
first recall that a function of period T may be represented as a complex Fourier 
series, cf. (12.9), 

00 00 

/(f) = C r e 2nirt/T = 6 > e,0>rl ’ (ill) 

r =— oo r=— oo 

where co r = 2nr/T. As the period T tends to infinity, the ‘frequency quantum’ 
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c((») exp icot 



Figure 13.1 The relationship between the Fourier terms for a function of 
period T and the Fourier integral (the area below the solid line) of the 
function. 


Am = 2it/T becomes vanishingly small and the spectrum of allowed frequencies 
m r becomes a continuum. Thus, the infinite sum of terms in the Fourier series 
becomes an integral, and the coefficients c r become functions of the continuous 
variable m, as follows. 

We recall, cf. (12.10), that the coefficients c r in (13.1) are given by 



where we have written the integral in two alternative forms and, for convenience, 
made one period run from — T/2 to + T / 2 rather than from 0 to T. Substituting 
from (13.2) into (13.1) gives 


00 A 

j., v A CO 

m = £ 77 


rT/2 


-T/2 


/( u) e 


' du e' 


,ico r t 


(13.3) 


At this stage co r is still a discrete function of r equal to 2nr/T. 

The solid points in figure 13.1 are a plot of (say, the real part of) c r e m,t as 
a function of r (or equivalently of co r ) and it is clear that (2n/T)c r e K0,t gives 
the area of the rth broken-line rectangle. If T tends to oo then Am (= 2n/T) 
becomes infinitesimal, the width of the rectangles tends to zero and, from the 
mathematical definition of an integral, 

00 A 1 /*CO 

E — / g(m)e k °‘dm. 

r=— oo ln - 71 J -°° 

In this particular case 

r T/2 

g(m r )= / f(u)e ,COrU du, 

J-T/2 
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and (13.3) becomes 


1 /»00 /»00 

/(f) =—/ dm e kot / duf(u) e~ iwu . (13.4) 

J — oo J — CO 

This result is known as Fourier’s inversion theorem. 

From it we may define the Fourier transform of /(f) by 

1 /'°° 

/(«)=—/ f(t)e- mt dt , (13.5) 

yj 2 k J — oo 

and its inverse by 


/(t) = 



f{m)e iwt dm. 


(13.6) 


Including the constant l/^fhz in the definition of f(m ) (whose mathematical 
existence as T — » oo is assumed here without proof) is clearly arbitrary, the only 
requirement being that the product of the constants in (13.5) and (13.6) should 
equal 1 /(2n). Our definition is chosen to be as symmetric as possible. 


► Find the Fourier transform of the exponential decay function /(f) = 0 for t < 0 and 
f{t) = A e- lt for t > 0 (A > 0). 


Using the definition (13.5) and separating the integral into two parts, 


/(«) 





g— {A+ico)t 1 


00 


X + ico 




e-^dt 


which is the required transform. It is clear that the multiplicative constant A does not 
affect the form of the transform, merely its amplitude. This transform may be verified by 
re-substitution of the above result into (13.6) to recover /(f), but evaluation of the integral 
requires the use of complex-variable contour integration (chapter 20). ◄ 


13.1.1 The uncertainty principle 

An important function that appears in many areas of physical science, either 
precisely or as an approximation to a physical situation, is the Gaussian or 
normal distribution. Its Fourier transform is of importance both in itself and also 
because, when interpreted statistically, it readily illustrates a form of uncertainty 
principle. 
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►find the Fourier transform of the normalised Gaussian distribution 


m = 


i 


■yj2 K 


exp - 


2t 2 


—OO < t < 00. 


This Gaussian distribution is centred on t = 0 and has a root mean square deviation 
Af = r. (Any reader who is unfamiliar with this interpretation of the distribution should 
refer to chapter 26.) 

Using the definition (13.5), the Fourier transform of /(f) is given by 


/(«) = 


1 

sfht , 
1 

jin . 


— ) exp(—iwt)dt 


= / — -j= exp < — — j [f 2 + 2 t 2 icot + (t 2 im) 2 — (r 2 iw) 2 ] > dt, 
n J — oo tJ 2n l 2r- J 


where the quantity — (t 2 !'co) 2 /(2t 2 ) has been both added and subtracted in the exponent 
in order to allow the factors involving the variable of integration f to be expressed as a 
complete square. Hence the expression can be written 


/(<*>) 


exp(— \t 2 co 2 ) 
Jin 



(t + it 2 co) 2 
2t 2 



The quantity inside the braces is the normalisation integral for the Gaussian and equals 
unity, although to show this strictly needs results from complex variable theory (chapter 20). 
That it is equal to unity can be made plausible by changing the variable to s = t + ifw 
and assuming that the imaginary parts introduced into the integration path and limits 
(where the integrand goes rapidly to zero anyway) make no difference. 

We are left with the result that 


7(„ ) --Lexp(-Af), (13.7) 

which is another Gaussian distribution, centred on zero and with a root mean square 
deviation A to = l/i. It is interesting to note, and an important property, that the Fourier 
transform of a Gaussian is another Gaussian. ◄ 


In the above example the root mean square deviation in t was t, and so it is 
seen that the deviations or ‘spreads' in f and in m are inversely related: 

Am Af = 1, 

independently of the value of t. In physical terms, the narrower in time is, say, an 
electrical impulse the greater the spread of frequency components it must contain. 
Similar physical statements are valid for other pairs of Fourier-related variables, 
such as spatial position and wave number. In an obvious notation, AkAx = 1 for 
a Gaussian wave packet. 

The uncertainty relations as usually expressed in quantum mechanics can be 
related to this if the de Broglie and Einstein relationships for momentum and 
energy are introduced; they are 

p = hk and E = Hco. 

Here h is Planck’s constant h divided by 2n. In a quantum mechanics setting /(f) 


442 




13.1 FOURIER TRANSFORMS 


is a wavefunction and the distribution of the wave intensity in time is given by 
|/| 2 (also a Gaussian). Similarly, the intensity distribution in frequency is given 
by |/| 2 . These two distributions have respective root mean square deviations of 
x/-j2 and l/(^2r), giving, after incorporation of the above relations, 

A E At = ti/2 and A p Ax = h/2. 

The factors of 1/2 that appear are specific to the Gaussian form, but any 
distribution /(f) produces for the product AEAt a quantity kh in which k is 
strictly positive (in fact the value 1/2 for a Gaussian is the minimum possible). 


13.1.2 Fraunhofer diffraction 

We take our final example of the Fourier transform from the field of optics. The 
pattern of transmitted light produced by a partially opaque (or phase-changing) 
object upon which a coherent beam of radiation falls is called a diffraction pattern 
and, in particular, when the cross-section of the object is small compared with 
the distance at which the light is observed the pattern is known as a Fraunhofer 
diffraction pattern. 

We will consider only the case in which the light is monochromatic with 
wavelength k. The direction of the incident beam of light can then be described 
by the wave vector k; the magnitude of this vector is given by the wave number 
k = 271 /k of the light. The essential quantity in a Fraunhofer diffraction pattern 
is the dependence of the observed amplitude (and hence intensity) on the angle 8 
between the viewing direction k' and the direction k of the incident beam. This 
is entirely determined by the spatial distribution of the amplitude and phase of 
the light at the object, the transmitted intensity in a particular direction k' being 
determined by the corresponding Fourier component of this spatial distribution. 

As an example, we take as an object a simple two-dimensional screen of width 
2T on which light of wave number k is incident normally; see figure 13.2. We 
suppose that at the position (0,y) the amplitude of the transmitted light is f(y) 
per unit length in the y-direction (f(y) may be complex). The function f(y) is 
called an aperture function. Both the screen and beam are assumed infinite in the 
z -direction. 

Denoting the unit vectors in the x- and y- directions by i and j respectively, 
the total light amplitude at a position ro = xoi + yoj, with xo > 0, will be the 
superposition of all the (Huyghens’) wavelets originating from the various parts 
of the screen. For large ro (= |ro|), these can be treated as plane waves to give/ 

AM- f /M«p[.X (r„- yi l (U8) 

J-Y l r o - yjl 

f This is the approach first used by Fresnel. For simplicity we have omitted from the integral a 

multiplicative inclination factor that depends on angle 0 and decreases as 0 increases. 
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Figure 13.2 Diffraction grating of width 2Y with light of wavelength 2n/k 
being diffracted through an angle 8. 


The factor exp[ik' • (r 0 — yj)] represents the phase change undergone by the light 
in travelling from the point yj on the screen to the point ro, and the denominator 
represents the reduction in amplitude with distance. (Recall that the system is 
infinite in the z-direction and so the ‘spreading’ is effectively in two dimensions 
only.) 

If the medium is the same on both sides of the screen then k' = k cos 0 i+k sin 0 j, 
and if r 0 Y then expression (13.8) can be approximated by 

ff(ro) = ex P^k — hd / f(y) ex p(— ikv sin 8) dy. (13.9) 

r o J-oo 

We have used that f(y) = 0 for |y| > Y, to extend the integral to infinite limits. 
The intensity in the direction 8 is then given by 

O7 T ^ 

1(8) = Ml 2 = — \f(q)\ 2 , (13.10) 

hr 

where q = k sin 8. 


► Evaluate 1(6 ) for an aperture consisting of two long slits each of width 2b whose centres 
are separated by a distance 2a, a > b; the slits are illuminated by light of wavelength 2. 


The aperture function is plotted in figure 13.3. We first need to find f(q ): 


m = 


1 

j2n 

1 

y/2n L 

-f 

iq^/fn 


-a+b 

e~ iqx dx + 

a—b ■sJ'T.Tl 

- -1 — a-\-b 


j pa+b 

27 1 J a—b 


-iqx 


iq 


+ 


1 


e~ iqx dx 


1 a-\-b 


l i( l \ a-b 


g -iq(-a+b ) _ e -iq(-a-b ) _j_ e ~iq(a+b) _ e ~iq{a-b) j . 
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f(v) 



Figure 13.3 The aperture function f(y) for two wide slits. 



13.1.3 The Dirac 5-function 

Before going on to consider further properties of Fourier transforms we make a 
digression to discuss the Dirac <5-function and its relation to Fourier transforms. 
The ^-function is different from most functions encountered in the physical 
sciences but we will see that a rigorous mathematical definition exists and the 
utility of the 5 -function will be demonstrated throughout the remainder of this 
chapter. It can be visualised as a very sharp narrow pulse (in space, time, density, 
etc.) which produces an integrated effect having a definite magnitude. The formal 
properties of the (5-function may be summarised as follows. 

The Dirac (5 -function has the property that 

,5(f) = 0 for t f 0, (13.11) 

but its fundamental defining property is 

J m5(t-a)dt = f(a), (13.12) 

provided the range of integration includes the point t = a; otherwise the integral 
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equals zero. This leads immediately to two further useful results: 

[ 5(t)dt = l for alia, b>0 (13.13) 

J —a 

and 

J S(t — a)dt = 1, (13.14) 

provided the range of integration includes t = a. 

Equation (13.12) can be used to derive further useful properties of the Dirac 
d -function: 


8(t) = <5( — f), (13.15) 

S(at) = -h(t), (13.16) 

\a\ 

tS(t) = 0. (13.17) 


►Prore that 5(bt) = S(t)/\b\. 

Let us first consider the case where b > 0. It follows that 

r r 00 /r'\ dt' 1 If” 

l jmbt)dt = J J (-) S (f) T = - b m = - b 

where we have made the substitution f' = bt. But / (f) is arbitrary and so we immediately 
see that S(bt) = 8(t)/b = <5(f)/|t>| for b > 0. 

Now consider the case where b = — c < 0. It follows that 

I” f(t)5(bt)dt = j = f 6(0 dt ' 

,1/(0), 1/(0) -!£/(,«,)* 

where we have made the substitution t' = bt = —ct. But /(f) is arbitrary and so 

S(bt) = 

\b\ 

for all b, which establishes the result. ◄ 

Furthermore, by considering an integral of the form 

j f(t)8(h(t))dt, 

and making a change of variables to z = h(t), we may show that 

where the /• are those values of f for which h(t) = 0 and h'(t) stands for dh/dt. 
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The derivative of the delta function, <5'(f), is defined by 



f(t)5'(t)dt= f(t)S(t ) 


= -/'( 0 ), 


dt 


(13.19) 


and similarly for higher derivatives. 

For many practical purposes, effects that are not strictly described by a <5- 
function may be analysed as such, if they take place in an interval much shorter 
than the response interval of the system on which they act. For example, the 
idealised notion of an impulse of magnitude J applied at time fo can be represented 
by 


j(t) = J5{t — to). (13.20) 

Many physical situations are described by a fi-function in space rather than in 
time. Moreover, we often require the ^-function to be defined in more than one 
dimension. For example, the charge density of a point charge q at a point ro may 
be expressed as a three-dimensional 5 -function 


p(r) = qS( r - r 0 ) = qS(x - x 0 )<5(y - yo)d{z - z 0 ), (13.21) 

so that a discrete ‘quantum’ is expressed as if it were a continuous distribution. 
From (13.21) we see that (as expected) the total charge enclosed in a volume V 
is given by 


p(r)dV = / qS(r — r 0 )dV 


q if ro lies in V, 
0 otherwise. 


Closely related to the Dirac ^-function is the Heaviside or unit step function 
H(t), for which 


H(t) = 


for t > 0, 
for t < 0. 


(13.22) 


This function is clearly discontinuous at t = 0 and it is usual to take H( 0) = 1/2. 
The Fleaviside function is related to the delta function by 

H'(t) = «5(f). (13.23) 
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►Prore relation ( 13.23 ). 


Considering the integral 


f 

J — o 


f(t)H'(t) dt = 




/ OO 

-OO 


( t)H(t ) dt 


= /M - 
= /M - 


f'(t)dt 


m 


= /( 0), 


and comparing it with (13.12) when a = 0 immediately shows that H'(t) 


«5(t). ◄ 


13.1.4 Relation of the 5-function to Fourier transforms 


In the previous section we introduced the Dirac 5 -function as a way of repre- 
senting very sharp narrow pulses, but in no way related it to Fourier transforms. 
We now show that the ^-function can equally well be defined in a way that more 
naturally relates it to the Fourier transform. 

Referring back to the Fourier inversion theorem (13.4), we have 


f(t)=— dme kot 
271 


duf(u) e~ w,,u 

= [ du f (n) i. — — [ e lm{l ~ u) dot] . 
J — oo l J — oo J 


Comparison of this with (13.12) shows that we may write the <5-function as 


<5(f — u) 


1 

2n 


e im(f-u) dm _ 


(13.24) 


Considered as a Fourier transform, this representation shows that a very 
narrow time peak at t = u results from the superposition of a complete spectrum 
of harmonic waves, all frequencies having the same amplitude and all waves being 
in phase at t — u. This suggests that the <5-function may also be represented as 
the limit of the transform of a uniform distribution of unit height as the width 
of this distribution becomes infinite. 

Consider the rectangular distribution of frequencies shown in figure 13.4(a). 
From (13.6), taking the inverse Fourier transform, 


fait) = 


1 


sjfft J-a 

2Q sinQf 


1 x e KOt dm 




Qt 


(13.25) 


This function is illustrated in figure 13.4(h) and it is apparent that, for large Q, it 
becomes very large at t = 0 and also very narrow about t = 0, as we qualitatively 
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Figure 13.4 (a) A Fourier transform showing a rectangular distribution of 

frequencies between +£!; ( b ) the function of which it is the transform, which 
is proportional to r 1 sin Qr. 


expect and require. We also note that, in the limit Q — » oo, fn(t), as defined by 
the inverse Fourier transform, tends to (2n) [ ^ 2 5(t) by virtue of (13.24). Flence we 
may conclude that the 8 -function can also be represented by 


d(f) = lim . 

Q— »oo V 


/ sinQf 


Tit 


(13.26) 


Several other function representations are equally valid, e.g. the limiting cases of 
rectangular, triangular or Gaussian distributions; the only essential requirements 
are a knowledge of the area under such a curve and that undefined operations 
such as dividing by zero are not inadvertently carried out on the (5-function whilst 
some non-explicit representation is being employed. 

We also note that the Fourier transform definition of the delta function, (13.24), 
shows that the latter is real since 

1 r m . , 

<5* (f) = — / e ,<nt dco = <5(— f) = <5(f). 

271 J — oo 

Finally, the Fourier transform of a 8 -function is simply 

1 r 00 1 

<5(®) = -= / 8(t)e~ Kot dt = (13.27) 

Z71 J —oo yf 2.71 


13.1.5 Properties of Fourier transforms 

Having considered the Dirac (5-function, we now return to our discussion of the 
properties of Fourier transforms. As we would expect, Fourier transforms have 
many properties analogous to those of Fourier series in respect of the connection 
between the transforms of related functions. Here we list these properties without 
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proof; they can be verified by working from the definition of the transform. As 
previously, we denote the Fourier transform of /(f) by f(co) or 

(i) Differentiation: 


&[f'(t)\ = irnf(oj). 


(13.28) 


This may be extended to higher derivatives, so that 


and so on. 

(ii) Integration: 


&[f”(t)] = = -® 2 /(®), 


& 


f(s)ds 



- — f(co ) + 2ncS(m), 
ia> 


(13.29) 


where the term 2ncd(co) represents the Fourier transform of the constant 
of integration associated with the indefinite integral. 

(iii) Scaling: 

= -!(-). (13.30) 

a \ a J 

(iv) Translation: 


nf(t + a)\=e iam f((o). (13.31) 

(v) Exponential multiplication: 

&[e at m]=f(a) + i<x), (13.32) 

where a may be real, imaginary or complex. 


►Prore relation ( 13.28 ). 


Calculating the Fourier transform of /'(f) directly, we obtain 



= i(of(m), 

if /(f) — > 0 at t = +oo, as it must since /// |/(f)| dt is finite. ◄ 


7(f) dt 


To illustrate a use and also a proof of (13.32), let us consider an amplitude- 
modulated radio wave. Suppose a message to be broadcast is represented by /(f). 
The message can be added electronically to a constant signal a of magnitude 
such that a +/(f) is never negative, and then the sum can be used to modulate 
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the amplitude of a carrier signal of frequency ao c . Using a complex exponential 
notation, the transmitted amplitude is now 

g(t) = A[a + m]J m ' t . (13.33) 


Ignoring in the present context the effect of the term Aaexp(iao c t), which gives a 
contribution to the transmitted spectrum only at to = oo c , we obtain for the new 
spectrum 



= Af(t>) — ao c ). 


(13.34) 


which is simply a shift of the whole spectrum by the carrier frequency. The use 
of different carrier frequencies enables signals to be separated. 


13.1.6 Odd and even functions 


If /(f) is odd or even then we may derive alternative forms of Fourier’s inversion 
theorem, which lead to the definition of different transform pairs. Let us first 
consider an odd function /(f) = — /(— f), whose Fourier transform is given by 


/(®) = 


1 

. 

i r°° 

^2k J — oo 

-2 i 

. 


/(f) e~ icot dt 


f (f)(cos cot — i sin cot ) dt 


/(f) sincaf dt. 


where in the last line we use the fact that /(f) and sin cot are odd, whereas cos cot 
is even. 

We note that f(—co) = —f(co), i.e. f(co) is an odd function of co. Flence 


m = 


1 r m ~ 2 i r°°~ 

—j= / f (co) e KOt dco = —j= / f (oo) sin cot doo 

■sJ'ZtI J— oo -\/ 27T J 0 


2 

71 


dco sin cot 


f(u) sin cou du > . 


Thus we may define the Fourier sine transform pair for odd functions: 


fs(oo) = 


m = 


/(f) sin cot dt, 

) 

fs(oo) sin cot dco. 


(13.35) 

(13.36) 
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Figure 13.5 Resolution functions: (a) ideal <5-function; (fc) typical unbiased 
resolution; (c) and (d) biases tending to shift observations to higher values 
than the true one. 


Note that although the Fourier sine transform pair was derived by considering 
an odd function /(t) defined over all f, the definitions (13.35) and (13.36) only 
require /(f) and f s (co) to be defined for positive f and co respectively. For an 
even function, i.e. one for which /(f) = f(—t), we can define the Fourier cosine 
transform pair in a similar way, but with sincof replaced by cosmf. 


13.1.7 Convolution and deconvolution 

It is apparent that any attempt to measure the value of a physical quantity is 
limited, to some extent, by the finite resolution of the measuring apparatus used. 
On the one hand, the physical quantity we wish to measure will be in general a 
function of an independent variable, x say, i.e. the true function to be measured 
takes the form f(x). On the other hand, the apparatus we are using does not give 
the true output value of the function; a resolution function g(y) is involved. By 
this we mean that the probability that an output value y — 0 will be recorded 
instead as being between y and y+dy is given by g(y)dy. Some possible resolution 
functions of this sort are shown in figure 13.5. To obtain good results we wish 
the resolution function to be as close to a b-function as possible (case (a)). A 
typical piece of apparatus has a resolution function of finite width, although if 
it is accurate the mean is centred on the true value (case ( b )). However, some 
apparatus may show a bias that tends to shift observations to higher or lower 
values than the true ones (cases (c) and (d)), thereby exhibiting systematic error. 

Given that the true distribution is /(x) and the resolution function of our 
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fix) ! * g(y) = Hz) 




X 


1 

y 

2b 



2b 

z 

a a —h 

b — fl a 


Figure 13.6 The convolution of two functions f(x) and g(y). 


measuring apparatus is g(y), we wish to calculate what the observed distribution 
h(z) will be. The symbols x, y and z all refer to the same physical variable (e.g. 
length or angle), but are denoted differently because the variable appears in the 
analysis in three different roles. 

The probability that a true reading lying between x and x + dx, and so having 
probability f(x)dx of being selected by the experiment, will be moved by the 
instrumental resolution by an amount z — x into a small interval of width dz is 
g(z — x) dz. Hence the combined probability that the interval dx will give rise to 
an observation appearing in the interval dz is f(x)dxg(z—x)dz. Adding together 
the contributions from all values of x that can lead to an observation in the range 
z to z + dz, we find that the observed distribution is given by 

/ CO 

f(x)g(z — x)dx. (13.37) 

-00 

The integral in (13.37) is called the convolution of the functions / and g and is 
often written / * g. The convolution defined above is commutative (/ * g = g * /), 
associative and distributive. The observed distribution is thus the convolution of 
the true distribution and the experimental resolution function. The result will be 
that the observed distribution is broader and smoother than the true one and, if 
g(y) has a bias, the maxima will normally be displaced from their true positions. 
It is also obvious from (13.37) that if the resolution is the ideal <5-function, 
g(y) = <5(y) then h(z) = f(z ) and the observed distribution is the true one. 

It is interesting to note, and a very important property, that the convolution of 
any function g(y) with a number of delta functions leaves a copy of g(y) at the 
position of each of the delta functions. 


>-Find the convolution of the function f(x) = 5(x + a) + <5(x — a) with the function g(y) 
plotted in figure 13.6. 

Using the convolution integral (13.37) 

/ CO />00 

/(x)g(z — x) dx = / [5(x + a) + S(x — a)]g(z — x)dx 

-00 J —00 

= g(z + a) + g(z - a). 

This convolution h(z) is plotted in figure 13.6. ◄ 
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Let us now consider the Fourier transform of the convolution (13.37); this is 
given by 

h(k ) = l [ dze~ lk ~\ [ f (x)g(z — x) dx\ 

\f2n J-oo U- oo J 

= [ dxf(x)\ [ g(z — x) e~‘ kz dzX . 

J2fij-oo XJ-OO J 

If we let u = z — x in the second integral we have 

h(k) = -L [" dx fW ( r g(u)e~ Hu+x) du} 

\J2tI J —co U-oo J 

1 /*CO /*CO 

= — = / f(x)e~' kx dx / g(u)e~ ,ku du 

J- oo 

= -4= X V^7(*) X V^g(fc) = J2nf(k)g(k)- (13.38) 

Flence the Fourier transform of a convolution / * g is equal to the product of the 
separate Fourier transforms multiplied by sj2n', this result is called the convolution 
theorem. 

It may be proved similarly that the converse is also true, namely that the 
Fourier transform of the product /(.x)g(x) is given by 

&U(x)g(x)] = -j=/(fc) * g(/c). (13.39) 


►Find the Fourier transform of the function in figure 13.3 representing two wide slits by 
considering the Fourier transforms of (i) two 8-functions, at x = +a, (ii) a rectangular 
function of height 1 and width 2b centred on x = 0. 


(i) The Fourier transform of the two <5-functions is given by 


i r 

jin L 

1 , _ 


f(q) = -2= I 5(x - a) e~ iqx dx + -L [ 

yJ2n J-O 


S(x + a) e~ iqx dx 


- ^ + eiq a ) = 2_c^qa 

yfht y yjln 

(ii) The Fourier transform of the broad slit is 


S(,I= VH 

-1 


II 

X 

X 

O' 

7 

cu 

' e ~iqx ' 

J-b y/2n 

—iq 


iqsj 2n 


(e 


-iqb 


jqbs = 2 singb 


q^/fn 


We have already seen that the convolution of these functions is the required function 
representing two wide slits (see figure 13.6). So, using the convolution theorem, the Fourier 
transform of the convolution is yfht times the product of the individual transforms, i.e. 
4 cos qa sin qb/(q^/2n). This is, of course, the same result as that obtained in the example 
in subsection 13.1.2. ◄ 
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The inverse of convolution, called deconvolution , allows us to find a true 
distribution f(x) given an observed distribution h(z) and a resolution function 
g(y)- 


►Arc experimental quantity f(x) is measured using apparatus with a known resolution func- 
tion g(y) to give an observed distribution h(z). How may f(x) be extracted from the mea- 
sured distribution? 


From the convolution theorem (13.38), the Fourier transform of the measured distribution 
is 

h(k) = y/2nj(k)g(k), 


from which we obtain 


m = 


1 h(k) 
V2 ng{k)' 


Then on inverse Fourier transforming we find 


/(*) = 


^2n 




h{k) 

W) 


In words, to extract the true distribution, we divide the Fourier transform of the observed 
distribution by that of the resolution function for each value of k and then take the inverse 
Fourier transform of the function so generated. ◄ 


This explicit method of extracting true distributions is straightforward for exact 
functions but, in practice, because of experimental and statistical uncertainties in 
the experimental data or because data over only a limited range are available, it 
is often not very precise, involving as it does three (numerical) transforms each 
requiring in principle an integral over an infinite range. 


13.1.8 Correlation functions and energy spectra 

The cross-correlation of two functions / and g is defined by 

/ 00 

f*(x)g(x + z)dx. (13.40) 

-00 

Despite the formal similarity between (13.40) and the definition of the convolution 
in (13.37), the use and interpretation of the cross-correlation and of the convo- 
lution are very different; the cross-correlation provides a quantitative measure of 
the similarity of two functions / and g as one is displaced through a distance z 
relative to the other. The cross-correlation is often notated as C = /®g, and, like 
convolution, it is both associative and distributive. Unlike convolution, however, 
it is not commutative, in fact 

[/®g]( z ) = [g® fY(-z)- (13.41) 
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►Prore the Wiener-Kinchin theorem, 

C(k) = Jin [f(k)]"g(k). (13.42) 


Following a method similar to that for the convolution of / and g, let us consider the 
Fourier transform of (13.40): 

C(k) = —j= f dz e _fc / f f(x)g{x + z)dx\ 

■\J2tI J — CO \J —CO J 

= / dxf(x)I [ g (x + z)e~ tkz dz\. 

\J2tL J GO 00 J 

Making the substitution u = x + z in the second integral we obtain 

C(k) = — J— / dxf'(x){ f g(u) e~ ik(u ~ x) du\ 

\J2TI J —CO U-OO J 

| /»oo nco 

= —— / f'(x)e ,kx dx / g(u) e~ <ku du 

Jl^J-J i-00 

= -4= x [/(fc)]* x Jlng{k) = Jin [f(k)]’g(k). ◄ 

Jin 

Thus the Fourier transform of the cross-correlation of / and g is equal to 
the product of [f(k)]* and g(k) multiplied by Jin. This a statement of the 
Wiener-Kinchin theorem. Similarly we can derive the converse theorem 

.^[f(x)g(x)] = ^=7®g. 

If we now consider the special case where g is taken to be equal to / in (13.40) 
then, writing the LHS as a(z), we have 

/ OO 

f*(x)f(x + z)dx; (13.43) 

-00 

this is called the auto-correlation function of f(x). Using the Wiener-Kinchin 
theorem (13.42) we see that 

1 r 00 

a(z ) = —j= / a(k) e lkz dk 

J —CO 

= — / jinimrm^dk, 

2tc J —co 

so that a(z) is the inverse Fourier transform of J2n\f(k)\ 2 , which is in turn called 
the energy spectrum of /. 


13.1.9 Parseval’s theorem 

Using the results of the previous section we can immediately obtain Parseval’s 
theorem. The most general form of this (also called the multiplication theorem ) is 
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obtained simply by noting from (13.42) that the cross-correlation (13.40) of two 
functions / and g can be written as 

/ oo /*co 

f\x)g(x + z)dx= [f(k)]*g(k)e' kz dk. (13.44) 

-OO J —CO 

Then, setting z = 0 gives the multiplication theorem 

[ f*(x)g(x) dx = [ [f(k)Yg(k) dk. (13.45) 


Specialising further, by letting g = /, we derive the most common form of 
Parseval’s theorem. 


/ OO /»00 

\f(x)\ 2 dx = / \f(k)\ 2 dk. (13.46) 

-00 J —CO 

When / is a physical amplitude these integrals relate to the total intensity involved 
in some physical process. We have already met a form of Parseval’s theorem for 
Fourier series in chapter 12; it is in fact a special case of (13.46). 


► 77ie displacement of a damped harmonic oscillator as a function of time is given by 


m = 


0 

e~ t/T sin coot 


for t < 0, 
for t > 0. 


Find the Fourier transform of this function and so give a physical interpretation of Parseval's 
theorem. 


Using the usual definition for the Fourier transform we find 


f(co)= f 0 x e ,mt dt + f e t/T sin otgte dt. 
J — oo J 0 

Writing sin coot as (e"" 0 ' — e~ mot )/2i we obtain 

~ 1 f 00 

/(to) = 0 + - / [ c - it (®-< B o- i A) _ g-m+mo -(/T)j dt 

2 i Jo 


1 


1 


co + a>o — i/t co — coq — i/'t 


which is the required Fourier transform. The physical interpretation of |/(co)| 2 is the energy 
content per unit frequency interval (i.e. the energy spectrum ) whilst |/(t)| 2 is proportional to 
the sum of the kinetic and potential energies of the oscillator. Hence (to within a constant) 
Parseval’s theorem shows the equivalence of these two alternative specifications for the 
total energy. ◄ 


13.1.10 Fourier transforms in higher dimensions 

The concept of the Fourier transform can be extended naturally to more than 
one dimension. For instance we may wish to find the spatial Fourier transform of 
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two- or three-dimensional functions of position. For example, in three dimensions 


we can define the Fourier transform of f(x,y,z ) as 

f(k x ,k y ,k z )= * 3/2 J J Jf{x,y,z)e~ lkxX e~ lk y y e~ ,kzZ dxdydz, (13.47) 

and its inverse by as 

f{x,y,z) = * 3/2 JJJfikjcky^^e'^e’^e'^ dk x dk y dk z . (13.48) 

Denoting the vector with components k x , k y , k z by k and that with components 
x, y, z by r, we can write the Fourier transform pair (13.47), (13.48) as 

700 - / /Me" 11 " A (13-49) 

/<r, “ (2^372 (13.50) 

From these relations we may deduce that the three-dimensional Dirac 3 -function 
can be written as 

<5(r)= (2 lyf eikrd ' k - (1151) 


Similar relations to (13.49), (13.50) and (13.51) exist for spaces of other dimen- 
sionalities. 


►hi three-dimensional space a function f (r) possesses spherical symmetry, so that f (r) = 
/(r). Find the Fourier transform of f( r) as a one-dimensional integral. 


Let us choose spherical polar coordinates in which the vector k of the Fourier transform 
lies along the polar axis (0 = 0). This we can do since /( r) is spherically symmetric. We 
then have 

d 3 r = r 2 sin 9 dr dd df and k ■ r = hr cos 9, 


where k = |k|. The Fourier transform is then given by 

/( r) e - ik r d 3 r 


^ (k) (27l) 3 /2 / J 


— ^ / dr / d9 / df f(r)r 2 sm9 e- ikrcos0 
(2n) 3/2 J 0 J 0 J o 


dr2nf(r)r 2 / dO sin Oe 


( 271 ) 3/2 J Q 

The integral over 9 may be straightforwardly evaluated by noting that 

d 


d9 


Therefore 


/( k) = 


(e 


1 


“) = ikr sin 9 e 


( 2777 ) 3/2 J Q 

i r 


(2tt)3/2 J q 


dr 2nf(r)r 2 
47tr 2 /(r) 


ikr 


0=n 


( sinfcr 
\ kr 


J 9=0 
dr. ◄ 
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A similar result may be obtained for two-dimensional Fourier transforms in 
which /( r) = f(p), i.e. /( r) is independent of azimuthal angle f. In this case, using 
the integral representation of the Bessel function Jo(x) given at the very end of 
subsection 16.7.3, we find 

7 (k) = £ 2npf(p)J 0 (kp) dp. (13.52) 


13.2 Laplace transforms 

Often we are interested in functions /(f) for which the Fourier transform does 
not exist because / -/» 0 as f — » oo, and so the integral defining / does not 
converge. For example, the function /(f) = f does not possess a Fourier transform. 
Furthermore, often we are interested in a given function only for f > 0, for example 
when we are given the value at f = 0 in an initial-value problem. This leads us to 
consider the Laplace transform, f(s) or 2z? [/(f)], of /(f), which is defined by 

POD 

m= / f(t)e~ st dt, (13.53) 

Jo 

provided that the integral exists. We assume here that s is real, but complex values 
would have to be considered in a more detailed study. In practice, for a given 
function /(f) there will be some real number so such that the integral in (13.53) 
exists for s > so but diverges for s < s 0 . 

Through (13.53) we define a linear transformation _Sf []that converts functions 
of the variable f to functions of a new variable s: 

& iafdt) + bf 2 (t )] = [/i(f)] + h-S* [/ 2 (f)] = afi(s) + bf 2 (s). (13.54) 


► Find the Laplace transforms of the functions (i) f(t) = 1, (ii) /(f) = e at , (Hi) /(f) = f", 
for n = 0, 1,2, 


(i) By direct application of the definition of a Laplace transform (13.53), we find 


r 

Jo 


' dt = 


-1 

s 


1 


if s > 0, 


where the restriction s > 0 is required for the integral to exist, 
(ii) Again using (13.53) directly, we find 


P(X) POO 

f(s)= / e at e~ s ‘dt= / e ia ~ s), dt 

Jo Jo 


r p(a-s)M 


if s > a. 
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(iii) Once again using the definition (13.53) we have 


/„(s) = / t"e st dt. 
Jo 


Integrating by parts we find 

Us) = 


—t n e~ 


+ - t n 

s Jo 


1 dt 


= 0 + -/„_i(s), if s > 0. 
s 

We now have a recursion relation between successive transforms and by calculating 
/o we can infer / 1; / 2 , etc. Since t° = 1, (i) above gives 

/o = -, if s > 0, 


(13.55) 


and 


1 - 2 ! 
/i(») = /*(*) = 3- 


n! 

/"( s ) = ^TT 


if s > 0. 


s^ S J 

Thus, in each case (i) (iii), direct application of the definition of the Laplace transform 
(13.53) yields the required result. ◄ 


Unlike that for the Fourier transform, the inversion of the Laplace transform 
is not an easy operation to perform, since an explicit formula for /(t), given f(s), 
is not straightforwardly obtained from (13.53). The general method for obtaining 
an inverse Laplace transform makes use of complex variable theory and is not 
discussed until chapter 20. However, progress can be made without having to find 
an explicit inverse, since we can prepare from (13.53) a ‘dictionary’ of the Laplace 
transforms of common functions and, when faced with an inversion to carry out, 
hope to find the given transform (together with its parent function) in the listing. 
Such a list is given in table 13.1. 

When finding inverse Laplace transforms using table 13.1, it is useful to note 
that for all practical purposes the inverse Laplace transform is unique - ) and linear 
so that 


Jzf- 1 [af\(s) + bf 2 (s )] = afi(t) + bf 2 (t). (13.56) 

In many practical problems the method of partial fractions can be useful in 
producing an expression from which the inverse Laplace transform can be found. 


► Using table 13.1 find f{t) if 

- . S ~f- 3 


As) -*(>+!)■ 


Using partial fractions /(s) may be written 



( This is not strictly true, since two functions can differ from one another at a finite number of 
isolated points but have the same Laplace transform. 
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/(f) 

m 

50 

c 

c/s 

0 

ct" 

cn ! /s" +1 

0 

sin bt 

b/(s 2 + b 2 ) 

0 

cos bt 

s/(s 2 + b 2 ) 

0 

e al 

l/(s — fl) 

a 

t n e at 

n!/(s — a) n+1 

a 

sinhaf 

a/(s 2 — a 2 ) 

\a\ 

cosh at 

s/(s 2 - a 2 ) 

\a\ 

e at sin bt 

a/[(s — a) 2 + b 2 ] 

a 

e at cos bt 

(s — a)/[(s — a) 2 + b 2 ] 

a 

p/2 


0 

t-v 2 

(n/sf! 2 

0 

<5(f — to) 

f 1 for t > to 

e~ s ‘« 

0 

H(t-t 0 )={ ~ 

(^0 for t < to 

e~ st ° /s 

0 


Table 13.1 Standard Laplace transforms. The transforms are valid for s > so- 


Comparing this with the standard Laplace transforms in table 13.1, we find that the inverse 
transform of 3/s is 3 for s > 0 and the inverse transform of 2/(s + 1) is 2e~' for s > — 1, 
and so 

f(t) = 3-2e~*, if s > 0. ◄ 


13.2.1 Laplace transforms of derivatives and integrals 


One of the main uses of Laplace transforms is in solving differential equations. 
Differential equations are the subject of the next six chapters and we will return 
to the application of Laplace transforms to their solution in chapter 15. In 
the meantime we will derive the required results, i.e. the Laplace transforms of 
derivatives. 

The Laplace transform of the first derivative of /(f) is given by 


2z? 


'df 

dt 


d f„-s, 


dt 


e st dt 


[f(t)e 


,-stl 00 


fife st dt 


—f (0) + sf (s), for s > 0. 


(13.57) 


The evaluation relies on integration by parts and higher-order derivatives may 
be found in a similar manner. 
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►Find the Laplace transform of d 2 f /dt 2 . 


Using the definition of the Laplace transform and integrating by parts we obtain 

~d 2 f] 


& 


dt 2 


= / —e- s 'dt 
1 dt 2 


f 


df 
-f -t 
dt 

d f, 


+s r d ie-*dt 

Jo Jo dt 


= -^(0) + s[ S /(s)-/(0)], for s > 0, 

where (13.57) has been substituted for the integral. This can be written more neatly as 


S? 


d 2 f 


dt 2 


= s 2 f{s) - sf( 0) - ^(0), for s > 0. ◄ 


In general the Laplace transform of the nth derivative is given by 

\d n f 


3? 


dt' 1 


df d"~ l f 

= s n f-s n ~'f(0) - s n ~ 2 ^(0) ^pj-(O), for s > 0. 


(13.58) 


We now turn to integration, which is much more straightforward. From the 
definition (13.53), 


SC 


f(u)du 


= / dte~ 5t / f(u)du 
Jo Jo 

ft 


i rt I 1 " /*00 1 

W sr / f(u)du + / -e~ s, f(t)dt. 

s Jo Jo io 5 


Jo Jo 

The first term on the RHS vanishes at both limits, and so 




f(u) du 


= -S? [/] • 

s 


(13.59) 


13.2.2 Other properties of Laplace transforms 

From table 13.1 it will be apparent that multiplying a function /(f) by e at has the 
effect on its transform that s is replaced by s — a. This is easily proved generally: 

f (t)e at e~ st dt 

f(t)e~ (s - a)t dt 

-a). (13.60) 

As it were, multiplying /(f) by e at moves the origin of s by an amount a. 
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We may now consider the effect of multiplying the Laplace transform f(s) by 
e~ bs ( b > 0). From the definition (13.53), 

rCQ 

e - bs f(s)= / e- s{t+b) f(t)dt 

Jo 

e~ s: f(z -b)dz. 



on putting t + b = z. Thus e bs f(s ) is the Laplace transform of a function g(f) 
defined by 


g(t) = 


0 

f(t~b) 


for 0 < t < b, 
for t > b. 


In other words, the function / has been translated to ‘later ’ t (larger values of r) 
by an amount b. 

Further properties of Laplace transforms can be proved in similar ways and 
are listed below. 

(i) JSf U(at)] = -/(-), (13.61) 

a \aJ 


(ii) 

(hi) 




d"f(s) 
ds n ’ 


for n = 1, 2, 3, ... , 


se 


'M 

t 


f(u) du. 


(13.62) 

(13.63) 


provided lim f _^o[/(t)/t] exists. 


Related results may be easily proved. 


► Find an expression for the Laplace transform of td 2 f/dt 2 . 


From the definition of the Laplace transform we have 

d 2 f] 


2S? 


dt 2 \ = Jo 

d r - s , d2 f j 

= --r / e 1 -fy dt 

US J q Civ 

= -^Ks)-sm-m\ 


Finally we mention the convolution theorem for Laplace transforms (which is 
analogous to that for Fourier transforms discussed in subsection 13.1.7). If the 
functions / and g have Laplace transforms /(s) and g(s) then 


2z? 



/(n)g(f — u) du 


= f(s)g(s). 


(13.64) 


463 




INTEGRAL TRANSFORMS 




Figure 13.7 Two representations of the Laplace transform convolution (see 
text). 


where the integral in the brackets on the LHS is the convolution of / and g, 
denoted by / * g. As in the case of Fourier transforms, the convolution defined 
above is commutative, i.e. / * g = g * /, and is associative and distributive. From 
(13.64) we also see that 

-^ _1 [/(s)g(s)] = [ f(u)g(t - u ) du = / * g. 

Jo 


► Prove the convolution theorem ( 13.64) for Laplace transforms. 

From the definition (13.64), 

/»00 fCO 

f{s)g(s)= e~ su f(u)du / e~ sv g(v)dv 

Jo Jo 

/»oo /»oo 

= / du dv e~ s(u+v) f(u)g(v). 

Jo Jo 

Now letting u + v = t changes the limits on the integrals, with the result that 

/»00 /*00 

f(s)g(s)= duf(u) / dtg(t-u)e~ st . 

Jo Ju 

As shown in figure 13.7(a) the shaded area of integration may be considered as the sum 
of vertical strips. However, we may instead integrate over this area by summing over 
horizontal strips as shown in figure 13.7(h). Then the integral can be written as 

ft fCO 

f(s)g(s)= duf(u) dt g(t — u)e~ st 

Jo Jo 

= j dte~ s ‘^J f {u)g(t — u) du^ 


= SC 


f(u)g(t — u ) du 
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The properties of the Laplace transform derived in this section can sometimes 
be useful in finding the Laplace transforms of particular functions. 


► Find the Laplace transform of f(t) = f sinfcf. 


Although we could calculate the Laplace transform directly, we can use (13.62) to give 

h \ 2 bs 


/» = (— 1 ) '*[■**] = - I b2 


(s 2 + fc 2 ) 2 ’ 


for s > 0. ◄ 


13.3 Concluding remarks 

In this chapter we have discussed Fourier and Laplace transforms in some detail. 
Both are examples of integral transforms , which can be considered in a more 
general context. 

A general integral transform of a function /(f) takes the form 

F(a) = j K(a,t)f(t)dt, (13.65) 

J a 

where F(ot) is the transform of /(f) with respect to the kernel K(x,t), and a is 
the transform variable. For example, in the Laplace transform case K(s,t ) = e~ st , 
a = 0, b = oo. 

Very often the inverse transform can also be written straightforwardly and 
we obtain a transform pair similar to that encountered in Fourier transforms. 
Examples of such pairs are 


(i) the Hankel transform 

/»00 

F(k) = / f(x)J n (kx)xdx , 

Jo 

rco 

f(x) = / F(k)J„(kx)k dk, 

Jo 

where the J„ are Bessel functions of order n, and 

(ii) the Mellin transform 

/*CO 

F(z)= / f z_1 /(f) dt, 

Jo 

i r ico 

f(t)= xr~. \ t z F(z)dz. 

2tli J —icq 

Although we do not have the space to discuss their general properties, the 
reader should at least be aware of this wider class of integral transforms. 
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13.1 


13.2 


13.3 

13.4 


13.5 


13.6 


13.7 


13.4 Exercises 


Find the Fourier transform of the function /(f) = exp(— |f|). 

(a) By applying Fourier’s inversion theorem prove that 


2 ex P(H f D = 


cos cot 

1 + CO 2 


clco. 


(b) By making the substitution co = tan 0, demonstrate the validity of Parseval's 
theorem for this function. 


Use the general definition and properties of Fourier transforms to show the 
following. 

(a) If /(x) is periodic with period a then f(k) = 0 unless ka = 2nn for integer n. 

(b) The Fourier transform of f/(f) is idf(co)/dco. 

(c) The Fourier transform of f(mt + c) is 

e iu>c/m 

m V m J 

Find the Fourier transform of H(x—a)e~ bx , where H(x) is the Heaviside function. 
Prove that the Fourier transform of the function f(t) defined in the f/-plane by 
straight-line segments joining (— T, 0) to (0,1) to (T, 0), with /(t) = 0 outside 
|f| < T, is 

T . , / coT\ 

nw) -jS ‘ “■(—)• 

where sine x is defined as (sinx)/x. 

Use the general properties of Fourier transforms to determine the transforms 
of the following functions, graphically defined by straight-line segments and equal 
to zero outside the ranges specified: 


(a) (0,0) to (0.5,1) to (1,0) to (2,2) to (3,0) to (4.5,3) to (6,0); 

(b) (-2,0) to (-1,2) to (1,2) to (2,0); 

(c) (0,0) to (0,1) to (1,2) to (1,0) to (2,-1) to (2,0). 

By taking the Fourier transform of the equation 

d ^- K ^ = f (x) 

show that its solution <j>(x) can be written as 


<Mx) 


-1 r j e^fik) 
s/2n J oo k 2 +K 2 


dk, 


where f(k ) is the Fourier transform of f(x). 

By differentiating the definition of the Fourier sine transform f s (co) of the function 
f(t) = t~ l/2 with respect to co, and then integrating the resulting expression by 
parts, find an elementary differential equation satisfied by f s (co). Hence show that 
this function is its own Fourier sine transform, i.e. f s {co) = Af(co), where A is a 
constant. Show that it is also its own Fourier cosine transform. (Assume that the 
limit as x — > oo of x l/2 sin ax can be taken as zero.) 

(a) Find the Fourier transform of the unit rectangular distribution 


/(f) 


1 |t|<l 

0 otherwise. 
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13.8 


13.9 


(b) Determine the convolution of / with itself and, without further integration, 
deduce its transform. 

(c) Deduce that 



da> = 7t, 



Calculate the Fraunhofer spectrum produced by a diffraction grating, uniformly 
illuminated by light of wavelength 2n/k, as follows. Consider a grating with 41V 
equal strips each of width a and alternately opaque and transparent. The aperture 
function is then 


f(y) 


A for (2n + 1 )a < y < (2 n + 2)a, —N < n < N, 
0 otherwise. 


(a) Show, for diffraction at angle 8 to the normal to the grating, that the required 
Fourier transform can be written 


N - 1 z-2 a 

f(q) = (2n)~ 1/2 ^^exp(—2iarq) / A exp(-iqu) du, 

J a 


where q = k sin 8. 

(b) Evaluate the integral and sum to show that 

J(q) = (27t)- 1/2 exp(— iqa/2) A sm ^ a ^ > 

q cos(qa/2) 

and hence that the intensity distribution I (8) in the spectrum is proportional 
to 

sin 2 (2i jaN) 
q 2 cos 2 (qa/2) 

(c) For large values of N, the numerator in the above expression has very closely 
spaced maxima and minima as a function of 6 and effectively takes its mean 
value, 1/2, giving a low-intensity background. Much more significant peaks 
in 1(8) occur when 8 = 0 or the cosine term in the denominator vanishes. 
Show that the corresponding values of \f(q)\ are 

2aNA , AaNA . , . 

- .. and — — — with m integral. 

(2ti) 1 /2 (27l) 1 /2(2,)l + 1)71 6 

Note that the constructive interference makes the maxima in 1(6) oc N 2 , not 
N. Of course, observable maxima only occur for 0 < 8 < n/2. 


By finding the complex Fourier series for its LF1S show that either side of the 
equation 

00 . CO 

Y j 8(t + nT)=^Y. 

n=— oo n=— oo 

can represent a periodic train of impulses. By expressing the function f(t + nX), 
in which X is a constant, in terms of the Fourier transform /(eo) of /(f), show 
that 

n=— oo n=— oo ' ' 

This result is known as the Poisson summation formula. 
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13.10 


In many applications in which the frequency spectrum of an analogue signal is 
required, the best that can be done is to sample the signal /(f) a finite number of 
times at fixed intervals and then use a discrete Fourier transform F k to estimate 
discrete points on the (true) frequency spectrum /(co). 


(a) By an argument that is essentially the converse of that given in section 13.1, 
show that, if N samples /„, beginning at t = 0 and spaced x apart, are taken, 
then f(2nk/(Nx)) x F k x where 


F k 



N - 1 

^ ^ jr g—2nnki/N 
n = 0 


(b) For the function /(f) defined by 


m 


1 for 0 < f < 1 
0 otherwise, 


from which eight samples are drawn at intervals of x = 0.25, find a formula 
for IT/ 1 and evaluate it for k = 0, 1,. . ., 7. 

(c) Find the exact frequency spectrum of /(f) and compare the actual and 
estimated values of ^[2n\](w)\ at co = kn for k = 0,1,..., 7. Note the 
relatively good agreement for k < 4 and the lack of agreement for larger 
values of k. 


13.11 


13.12 


13.13 


For a function /(f) that is non-zero only in the range |f| < T /2, the full frequency 
spectrum /(co) can be constructed, in principle exactly, from values at discrete 
sample points co = n(2n/T). Prove this as follows. 


(a) Show that the coefficients of a complex Fourier series representation of /(f) 
with period T can be written as 

_ yfht ~ f 2m ? 




T 


(b) Use this result to represent /(f) as an infinite sum in the defining integral for 
/(co), and hence show that 


f((o) = it, f sine 


coT 


where sine x is defined as (sinx)/x. 


A signal obtained by sampling a function x(t) at regular intervals T is passed 
through an electronic filter, whose response g(f) to a unit (5-function input is 
represented in a fg-plot by straight lines joining (0,0) to (T, l/T) to (2T,0) and 
is zero for all other values of f. The output of the filter is the convolution of the 
input, YY x(t)S(t-nT), with g(f). 

Using the convolution theorem, and the result given in exercise 13.4, show that 
the output of the filter can be written 


y[t)= Yn 


OO />(X 

x ( nT ) / 

J-o 




(a) Find the Fourier transform of 




e yt sin pt f > 0 
0 f < 0, 


where y (> 0) and p are constant parameters. 
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13.4 EXERCISES 


13.14 


13.15 


13.16 


13.17 


13.18 


(b) The current /(f) flowing through a certain system is related to the applied 
voltage V(t) by the equation 

/ CO 

K(t — u)V(u)du, 

-00 

where 

K( t) = ajiyupuz) + a 2 f(y 2 ,P 2 ,i:)- 

The function f(y,p, t) is as given in (a) and all the a,-, 7 * (> 0) and p, are fixed 
parameters. By considering the Fourier transform of I(t), find the relationship 
that must hold between a! and 02 if the total net charge Q passed through 
the system (over a very long time) is to be zero for an arbitrary applied 
voltage. 

Prove the equality 


lat sin 2 at dt = — 


4 a 4 + < 


- dm. 


A linear amplifier produces an output that is the convolution of its input and its 
response function. The Fourier transform of the response function for a particular 
amplifier is 


K( co) 


iw 

yj2n(a. + ia)) 2 


Determine the time variation of its output g(t) when its input is the Fleaviside 
step function. (Consider the Fourier transform of a decaying exponential function 
and the result of exercise 13.2(b).) 

In quantum mechanics, two equal-mass particles having momenta p y = ftk,- and 
energies Ej = hwj and represented by plane wavefunctions tftj = exp[i(k,-r,— co,-t)], 
j = 1,2, interact through a potential V = V(\ti — r 2 |). In first-order perturbation 
theory the probability of scattering to a state with momenta and energies p' , £'• 
is determined by the modulus squared of the quantity 


M = 


=Hh< Vv ‘ 


dti dt2 dt. 


The initial state ipj is 4>i4>2 and the final state \pf is 

(a) By writing ri + = 2R and ri — r 2 = r and assuming that dri dr 2 = dRdr, 

show that M can be written as the product of three one-dimensional integrals. 

(b) From two of the integrals deduce energy and momentum conservation in the 
form of 5 -functions. 

(c) Show that M is proportional to the Fourier transform of V , i.e. F(k) where 
2ftk = (p 2 — Pi) — (P 2 — P'l)- 

For some ion-atom scattering processes, the potential V of the previous example 
may be approximated by V = |r t — r 2 | _1 exp(— /j|r t — r 2 |). Show, using the result 
of the worked example in subsection 13.1.10, that the probability that the ion 
will scatter from, say, p! to p' } is proportional to (p 2 + k 2 )~ 2 where k = |k| and k 
is as given in part (c) of exercise 13.16. 

The equivalent duration and bandwidth, T e and B e , of a signal x(t) are defined 
in terms of the latter and its Fourier transform x(w) : 


T = — 

e 40) 
1 

Se X(0) 
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13.19 


13.20 


13.21 


13.22 


13.23 


13.24 


where neither x(0) nor x(0) is zero. Show that the product T e B e = 2jr (this is a 
form of uncertainty principle), and find the equivalent bandwidth of the signal 


x(t) = exp(— |f|/T). 

For this signal, determine the fraction of the total energy that lies in the frequency 
range |co| < B e / 4. You will need the indefinite integral with respect to x of 
(a 2 + x 2 )~ 2 , which is 

x 1 _j x 

2a 2 (a 2 + x 2 ) 2a 3 tan a' 

Calculate directly the auto-correlation function a(z) for the product of the expo- 
nential decay distribution and the Fleaviside step function 

/(f) = l -e- Xt H«). 

A 

Use the Fourier transform and energy spectrum of /(f) to deduce that 


/: 


X 2 + < 


■ dco = —e 2|z| . 
2 


Prove that the cross-correlation C(z) of the Gaussian and Lorentzian distributions 


1 


2t 2 


/(f) = — j= exp 
zflTl 

has as its Fourier transform the function 
1 


V2?r 


exp - 


Mi) 


exp(— a\co\). 


1 

f 2 + a 2 ’ 


Flence show that 


C(z) = —j= exp 

TyJ2n 


2t 2 


•(?)■ 


Prove the expressions given in table 13.1 for the Laplace transforms of f 1/2 and 
f 1/2 , by setting x 2 = ts in the result 


/ exp(— x) dx = I, . 
Jo 


Find the functions y(t) whose Laplace transforms are the following, 

(a) l/(s 2 — s — 2), 

(b) 2s/ [(s + l)(s 2 + 4)], 

(c) e h +s)f °/[(s + y) 2 + b 2 ]. 

Use the properties of Laplace transforms to prove the following without evaluat- 
ing any Laplace integrals explicitly: 

(a) se [V 2 ] = 

(b) _S? [(sinhflf)/f] = \ In [(s + a)/(s — a)] , s > |a|. 

(c) ££ [sinh at cos bt] = a(s 2 — a 2 + fc 2 )[(s — a) 2 + 1 [(s + a) 2 + fe 2 ] -1 . 

Find the solution (the so-called impulse response or Green's function ) of the 
equation 


r| + x = .(f, 


by proceeding as follows. 
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13.25 


13.26 


13.27 


(a) Show by substitution that 

x(t) = A(\ - e~ t,T )H(t) 
is a solution, for which x(0) = 0, of 
dx 

T—+x=AH(t), (*) 

where H(t) is the Heaviside step function. 

(b) Construct the solution when the RHS of (*) is replaced by AH(t — t) with 
dx/dt = x = 0 for t < t, and hence find the solution when the RHS is a 
rectangular pulse of duration r. 

(c) By setting A = 1/t and taking the limit when r — > 0, show that the impulse 
response is x(f) = r~ 1 e~ I/r . 

(d) Obtain the same result much more directly by taking the Laplace transform 
of each term in the original equation, solving the resulting algebraic equation 
and then using the entries in table 13.1. 

(a) If /(f) = A + g(f), where A is a constant and the indefinite integral of g(f) is 
bounded as its upper limit tends to oo, show that 

lims/(s) = A. 


(b) For t > 0 the function y(f) obeys the differential equation 

d 2 y dy , , 

— r- + a— + by = c cos cot, 
dt 2 dt 


where a, b and c are positive constants. Find y(s) and show that sy(s) — > c/2b 
as s — > 0. Interpret the result in the f-domain. 

By writing f(x) as an integral involving the ^-function <5(£ — x) and taking the 
Laplace transforms of both sides, show that the transform of the solution of the 
equation 


d 4 y 
dx 4 


-y 


f(x) 


for which y and its first three derivatives vanish at x = 0 can be written as 

y(s)= / /(f)-i — 7 dt 

Jo S — l 


Use the properties of Laplace transforms and the entries in table 13.1 to show 
that 

y(x) =lf 0 [sinh(x ~0~ sin(x - c)] dc. 

The function f a (x) is defined as unity for 0 < x < a and zero otherwise. Find its 
Laplace transform f a (s) and deduce that the transform of x/„(x) is 

^ [1 — (1 + as)e -sa ] . 

Write f a (x ) in terms of Heaviside functions and hence obtain an explicit expres- 
sion for 

ga(x) = / fa(y)fa(x ~y)dy. 

Jo 


Use the expression to write g„(s) in terms of the functions fjs), and / 2a _(s) and 
their derivatives, and hence show that g fl (s) is equal to the square of / a (s), in 
accordance with the convolution theorem. 
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13.28 (a) Show that the Laplace transform of f(t — a)H(t — a), where a > 0, is e as f(s). 

(b) If g(f) is a periodic function of period T, show that g(s) can be written as 

(c) Sketch the periodic function defined in 0 < t < T by 


g(t) 


2 t/T 0<t<T/2 

2(1 -f /T) T/2<t<T , 


and, using the result in (b), find its Laplace transform, 

(d) Show, by sketching it, that 


2 00 

+ 2^(-l)"(f - \nT)H(t-\nT)] 

n= 1 


is another representation of g(t) and hence derive the relationship 


00 

tanh x = 1 + 2^(-l ) n e~ 2nx . 

n = 1 


13.5 Hints and answers 

13.1 (2/tt) 1/2 (1 + m 2 )- 1 . 

13.2 (a) Show /(fc)(l - e ±ika ) = 0. 

13.3 (1/72 n)[(b - ik)/(b 2 + k 2 )]e~< b+lk \ 

13.4 (a) [8/(v57tco 2 )] [e^“ /2 sin 2 (co/4) + e - i2m sin 2 (co/2) + e -‘' 9 “ /2 sin 2 (3m/4)]. 

(b) Consider the superposition of a ‘triangle’ of height 2 with T = 2 and two 
triangles, each of unit height with T = 1, displaced by +1; [8sin 2 (co/2) (1 + 
2coseo)]/(727ico 2 ). 

(c) Consider the superposition of a triangle and its derivative. 

[(1 + ico)e^ ! “/727r]sinc 2 (co/2). 

13.6 df s (w)/da> = —f s (m)/(2(o). 

13.7 (a) (2/72jr)(sin m/co). 

(b) 2— |f| for t| < 2, zero otherwise. Use convolution theorem; (4/^/27r )(sin 2 co/co 2 ). 

(c) Apply Parseval’s theorem to / and to / * /. 

13.8 (c) Use l’Hopital’s rule to evaluate the expressions of the form 0/0. 

13.10 (b) \F k \ = cosec (kn/S)/.yj2n for k odd; \F k \ = 0 for k even, except \F 0 \ = 4/yj2n. 
(c) 7 2nf(c») = c _ '“ /2 [sin(aj/2)/(oj/2)]. Actual (estimated) values at co = kn for 
k = 0,1,. ..,7: 

1 (1); 0.637(0.653); 0(0); 0.212 (0.271); 0 (0); 0.127 (0.271); 0 (0); 0.091 (0.653). 

13.11 (b) Recall that the infinite integral involved in defining /(co) only has a non-zero 
integrand in |f| < T /2. 

13.12 The Fourier transform of g(t) is found by moving the time origin by T and then 
applying (13.31). It is (I/T^jj) sinc 2 (cor /2)e~ mT . 

13.13 (a) (1/ + ia>) 2 + p 2 ]}. 

(b) Show that Q = -J2nl(0) and use the convolution theorem. The required 
relationship is flipi/(yf + Pi) + aiPi/iyj + p\) = 0. 

13.14 Set p = y = a in part (a) of exercise 13.13 and then apply Parseval’s theorem. 

13.15 g(co) = 1/[727 t(<x + ico) 2 ], leading to g(t) = te~ M . 

13.16 (b) The f-integral is f exp[/(£( + £( — Ei — £ 2 )] dt oc S(E[ + £( — (£1 + £ 2 )); 
similarly the R-integral yields <5(7 + p' 2 — (p 1 + P 2 ))- 
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13.17 F(k) oc [— 2n/(ik)] J{exp[— (/z — ik)r] — exp [— (n + ik)r]} dr. 

13.18 By setting t = 0 and co = 0 in the Fourier definitions, obtain two equations 
connecting .x(0) and x(0). B e = n/T; ,x(<m), proportional to the Fourier cosine 
transform of exp (— t/T), is equal to [2T /*j2n](l + co 2 T 2 )~ 1 . The energy spectrum 
is proportional to |.x(co)| 2 . Fraction = 0.733. 

13.19 Note that the lower limit in the calculation of a(z) is 0 for z > 0 and |z| for 
z < 0. Auto-correlation a(z ) = [(l/(22 3 )] exp(— 2|z|). 

13.20 Use the result of exercise 13.18 to deduce that g(ct>) = (l/^/2re)exp(— a|<a|). Apply 
the Wiener-Kinchin theorem. Note that, because of the presence of |co|, the 
inverse transform giving C(z) is a cosine transform. 

13.21 Prove the result for t l/1 by integrating that for t~ 1/ 2 by parts. 

13.22 (a) y(t) = |(e 2 ' — e~ l ). 

(b) y(t) = j(4sin2f + 2cos2f — 2e f ). 

(c) Note the factor e~ s! ° and write y(t) as a function of (f — to); y(t) = 
b~ { e~ n sinh(f — to) H(t — to). 

13.23 (a) Use (13.62) with n = 2 on _Sf [^/t] ; (b) use (13.63); 

(c) consider _S? [exp(+af) cos bt] and use the translation property, subsection 13.2.2. 

13.24 (b) Superimpose solutions with equal amplitudes but opposite signs. x(t) = 
A(l-e- ,/T )H{t)-A(l-e-C- z)/T )H(t-T). 

(c) Write e^‘^ z)/T as e -t/r [l + z/T + 0( r 2 )] and note that, with 0 < t < z, t(1 — 
e~^ T )/z — > 0 as r — > 0. 

(d) The algebraic equation is x = (1 + sT) _1 . 

13.25 (a) Note that | lim f g{t)e~ st dt\ < | lim f g(t)dt\. 

(b) (s 2 + as + b)y(s) = {c(s 2 + 2co 2 )/[s(s 2 + 4co 2 )]} + (n + s)y(0) + y'(0). 

For this damped system, at large f (corresponding to s — > 0) rates of change 
are negligible and the equation reduces to by = c. cos 2 cot, with cos 2 cot having an 
average value j. 

13.26 Factorise (s 4 — 1)~* as ^[(s 2 — l)” 1 — (s 2 + 1 ) 1 ] . 

13.27 s _1 [l — exp(— sa)] ; g a (.x) = x for 0 < x < a, g a (.x) = 2a — x for a < x < 2a, 
g a (x) = 0 otherwise. 

13.28 (a) Note that f™H(t)--- = /" H(f - T) • • • and that H(t~ T)g(t) = H(t- 
T)g(t-T). 

(c) g(s)= [2/(Ts 2 )] tanh(sT/4). 

(d) Use the result from (a) and 2?[tH(t)] = s~ 2 ; set sT = 4.x. 
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14 


First-order ordinary differential 

equations 


Differential equations are the group of equations that contain derivatives. Chap- 
ters 14-19 discuss a variety of differential equations, starting in this chapter and 
the next with those ordinary differential equations (ODEs) that have closed-form 
solutions. As its name suggests, an ODE contains only ordinary derivatives (and 
not partial derivatives) and describes the relationship between these derivatives of 
the dependent variable, usually called y, with respect to the independent variable, 
usually called x. The solution to such an ODE is therefore a function of x and 
is written y(x). For an ODE to have a closed-form solution, it must be possible 
to express y{x) in terms of the standard elementary functions such as exp x, In x, 
sin x etc. The solutions of some differential equations cannot, however, be written 
in closed form, but only as an infinite series; these are discussed in chapter 16. 

Ordinary differential equations may be separated conveniently into differ- 
ent categories according to their general characteristics. The primary grouping 
adopted here is by the order of the equation. The order of an ODE is simply the 
order of the highest derivative it contains. Thus equations containing dy/dx, but 
no higher derivatives, are called first order, those containing d 2 y/dx 2 are called 
second order and so on. In this chapter we consider first-order equations, and in 
the next, second- and higher-order equations. 

Ordinary differential equations may be classified further according to degree. 
The degree of an ODE is the power to which the highest-order derivative is 
raised, after the equation has been rationalised to contain only integer powers of 
derivatives. Hence the ODE 


dff 

dx 2 


dx J 


3/2 


+ x 2 y = 0, 


is of third order and second degree, since after rationalisation it contains the term 

(d 2 y/dx 2 ) 2 . 

The general solution to an ODE is the most general function y(x) that satisfies 
the equation; it will contain constants of integration which may be determined by 
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14.1 GENERAL FORM OF SOLUTION 


the application of some suitable boundary conditions. For example, we may be 
told that for a certain first-order differential equation, the solution y(x) is equal to 
zero when the parameter x is equal to unity; this allows us to determine the value 
of the constant of integration. The general solutions to nth-order ODEs, which 
are considered in detail in the next chapter, will contain n (essential) arbitrary 
constants of integration and therefore we will need n boundary conditions if these 
constants are to be determined (see section 14.1). When the boundary conditions 
have been applied, and the constants found, we are left with a particular solution 
to the ODE, which obeys the given boundary conditions. Some ODEs of degree 
greater than unity also possess singular solutions, which are solutions that contain 
no arbitrary constants and cannot be found from the general solution; singular 
solutions are discussed in more detail in section 14.3. When any solution to an 
ODE has been found, it is always possible to check its validity by substitution 
into the original equation and verification that any given boundary conditions 
are met. 

In this chapter, firstly we discuss various types of first-degree ODE, and then 
go on to examine those higher-degree equations that can be solved in closed form. 
At the outset, however, we discuss the general form of the solutions of ODEs; 
this discussion is relevant to both first- and higher-order ODEs. 


14.1 General form of solution 

It is helpful when considering the general form of the solution of an ODE to 
consider the inverse process, namely that of obtaining an ODE from a given 
group of functions, each one of which is a solution of the ODE. Suppose the 
members of the group can be written as 

y = f(x,ai,a 2 ,...,a„), (14.1) 

each member being specified by a different set of values of the parameters a t . For 
example, consider the group of functions 

y = ai sinx + fl 2 cosx; (14.2) 

here n = 2. 

Since an ODE is required for which any of the group is a solution, it clearly 
must not contain any of the a,-. As there are n of the a,- in expression (14.1), we 
must obtain n + 1 equations involving them in order that, by elimination, we can 
obtain one final equation without them. 

Initially we have only ( 14.1), but if this is differentiated n times, a total of n + 1 
equations is obtained from which (in principle) all the a,- can be eliminated, to 
give one ODE satisfied by all the group. As a result of the n differentiations, 
d n y/dx n will be present in one of the n + 1 equations and hence in the final 
equation, which will therefore be of nth order. 
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In the case of (14.2), we have 

dy 

— = a\ cos .x — «2 sin x, 
dx 

d 2 y 

— — ^ = —a i sin x — a-> cos x. 
dx 1 

Here the elimination of a\ and a 2 is trivial (because of the similarity of the forms 
of y and d 2 y/dx 2 ), resulting in 


d 2 y 

dx 2 


+ y = 0, 


a second-order equation. 

Thus, to summarise, a group of functions (14.1) with n parameters satisfies an 
nth-order ODE in general (although in some degenerate cases an ODE of less 
than nth order is obtained). The intuitive converse of this is that the general 
solution of an nth-order ODE contains n arbitrary parameters (constants); for 
our purposes, this will be assumed to be valid although a totally general proof is 
difficult. 

As mentioned earlier, external factors affect a system described by an ODE, 
by fixing the values of the dependent variables for particular values of the 
independent ones. These externally imposed (or boundary) conditions on the 
solution are thus the means of determining the parameters and so of specifying 
precisely which function is the required solution. It is apparent that the number 
of boundary conditions should match the number of parameters and hence the 
order of the equation, if a unique solution is to be obtained. Fewer independent 
boundary conditions than this will lead to a number of undetermined parameters 
in the solution, whilst an excess will usually mean that no acceptable solution is 
possible. 

For an nth-order equation the required n boundary conditions can take many 
forms, for example the value of y at n different values of x, or the value of any 
n— 1 of the n derivatives dy/dx, d 2 y/dx 2 , ..., d n y/dx n together with that of y, all 
for the same value of x, or many intermediate combinations. 


14.2 First-degree first-order equations 


First-degree first-order ODEs contain only dy/dx equated to some function of x 
and y, and can be written in either of two equivalent standard forms, 


dy 

dx 


= F(x,y),A(x,y)dx + B(x,y)dy = 0, 


where F(x,y) = — A(x, y)/B(x,y), and F(x,y), A(x,y) and B(x,y) are in general 
functions of both x and y. Which of the two above forms is the more useful 
for finding a solution depends on the type of equation being considered. There 
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14.2 FIRST-DEGREE FIRST-ORDER EQUATIONS 


are several different types of first-degree first-order ODEs that are of interest in 
the physical sciences. These equations and their respective solutions are discussed 
below. 


14.2.1 Separable-variable equations 

A separable-variable equation is one which may be written in the conventional 
form 

£ =f(x)g(y ), (14.3) 

where f(x) and g(y) are functions of x and y respectively, including cases in 
which /(x) or g(y) is simply a constant. Rearranging this equation so that the 
terms depending on x and on y appear on opposite sides (i.e. are separated), and 
integrating, we obtain 

Finding the solution y(x) that satisfies (14.3) then depends only on the ease with 
which the integrals in the above equation can be evaluated. It is also worth 
noting that ODEs that at first sight do not appear to be of the form (14.3) can 
sometimes be made separable by an appropriate factorisation. 


►So/re 


dy 

t = x + x y- 

ax 


Since the RFIS of this equation can be factorised to give x(l + y), the equation becomes 
separable and we obtain 

Now integrating both sides separately, we find 


ln(l + y) = — +c, 


and so 


1 + y = exp ( — + c 1 = A exp ^ 


where c and hence A is an arbitrary constant. ◄ 


Solution method. Factorise the equation so that it becomes separable. After rear- 
ranging it so that the terms depending on x and those depending on y appear on 
opposite sides, integrate directly. Remember the constant of integration, which can 
be evcduated if further information is given. 


477 




FIRST-ORDER ORDINARY DIFFERENTIAL EQUATIONS 


14.2.2 Exact equations 

An exact first-degree first-order ODE is one of the form 


section 5.3). In other words 


from which we obtain 


8A 8B 


and for which = — — . 

dy 8x 

(14.4) 

an exact differential, dU(x,y) 

say (see 

8U , 8U , 


= — dx + — dy, 

8x dy ' 


8U 


8x ’ 

(14.5) 

8U 


~w 

(14.6) 


Since 8 2 U/8xdy = 8 2 U/8y8x we therefore require 

8 A _ 8B 

8y 8x ’ 

If (14.7) holds then (14.4) can be written dU(x,y) = 0, which has the solution 
U(x,y) = c, where c is a constant and from (14.5) U(x,y) is given by 


(14.7) 


U(x,y) = j A(x,y)dx + F(y). (14.8) 

The function F(y) can be found from (14.6) by differentiating (14.8) with respect 
to y and equating to B(x,y). 


>-Solve 


x— + 3x + v = 0. 
ax 


Rearranging into the form (14.4) we have 

(3x + y ) dx + x dy = 0, 

i.e. A(x,y) = 3 x + y and B(x,y) = x. Since 8A/dy = 1 = 8B/dx, the equation is exact, and 
by (14.8) the solution is given by 

U(x,y) = J(3x + y)dx + F(y) = a => + yx + F(y) = c\. 

Differentiating U(x,y) with respect to y and equating it to B(x,y ) = x we obtain dF /dy = 0, 
which integrates immediately to give F(y) = C 2 . Therefore, letting c = c\ — C 2 , the solution 
to the original ODE is 

3v 2 

— + xy = c. ◄ 
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Solution method. Check that the equation is an exact differential using ( 14.7) then 
solve using (14.8). Find the function F(y) by differentiating (14.8) with respect to 
y and using (14.6). 


14.2.3 Inexact equations: integrating factors 
Equations that may be written in the form 


A(x, v) dx + B(x, y) dy = 0 but for which ( ^ f ^ 

8y dx 


(14.9) 


are known as inexact equations. However, the differential Adx + B dy can always 
be made exact by multiplying by an integrating factor p(x,y), which obeys 


d{pA) d(pB) 

dy dx 


(14.10) 


For an integrating factor that is a function of both x and y, i.e. p = p(x,y), there 
exists no general method for finding it; in such cases it may sometimes be found 
by inspection. If, however, an integrating factor exists that is a function of either 
x or y alone then (14.10) can be solved to find it. For example, if we assume 
that the integrating factor is a function of x alone, i.e. p = p(x), then (14.10) 
reads 

dA dB dp 
Rearranging this expression we find 


dp 

F 


i /dA 
B \dy 



dx = /(x) dx. 


where we require /(x) also to be a function of x only; indeed this provides a 
general method of determining whether the integrating factor p is a function of 
x alone. This integrating factor is then given by 


p(x) = exp 


/ (x) dx 


where 


/(.X) = i 




Similarly, if p = p(y) then 


p(y) = exp 


g(v) dy 


where 





(14.11) 


(14.12) 
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►SoLe 

dy 2 3 y 

dx y 2x 


Rearranging into the form ( 14.9), we have 

(4x + 3y 2 ) dx + 2xy dy = 0, (14.13) 


i.e. A(x,y) = 4x + 3 y 2 and B(x,y) = 2 xy. Now 

3A 8B 

dj =6y ’ d^ =2y ’ 

so the ODE is not exact in its present form. However, we see that 


1 (8A 8B\ _ 2 

B \ dy dx ) x ’ 

a function of x alone. Therefore an integrating factor exists that is also a function of x 
alone and, ignoring the arbitrary constant of integration, is given by 


jk(x) = exp 



exp(21nx) = x 2 . 


Multiplying (14.13) through by p(x) = x 2 we obtain 

(4x 3 + 3x 2 y 2 ) dx + 2x 3 y dy = 4x 3 dx + (3x 2 y 2 dx + 2 x 3 y dy) = 0. 

By inspection this integrates immediately to give the solution x 4 + y 2 x 3 = c, where c is a 
constant. ◄ 


Solution method. Examine whether f(x ) and g(y) are functions of only x or y 
respectively. If so, then the required integrating factor is a function of either x or 
y only, and is given by (14.11 ) or ( 14.12) respectively. If the integrating factor is 
a function of both x and y, then sometimes it may be found by inspection or by 
trial and error. In any case, the integrating factor p must satisfy (14.10). Once the 
equation has been made exact, solve by the method of subsection 14.2.2. 


14.2.4 Linear equations 

Linear first-order ODEs are a special case of inexact ODEs (discussed in the 
previous subsection) and can be written in the conventional form 

^+P(x)y = Q(x). (14.14) 

Such equations can be made exact by multiplying through by an appropriate 
integrating factor in a similar manner to that discussed above. In this case, 
however, the integrating factor is always a function of x alone and may be 
expressed in a particularly simple form. An integrating factor p(x) must be such 
that 

dy d r 

A*(x)-r + h(x)P{x)y = — [/t(x)y] = p(x)Q(x), (14.15) 
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which may then be integrated directly to give 



p(x)Q(x) dx. 


(14.16) 


The required integrating factor p(x ) is determined by the first equality in (14.15), 
i.e. 


d 

dx 


, dy du, dv 

,)iy> - "t* + i; y ~ >‘i; + ,,Py - 


which immediately gives the simple relation 




ju(x) = exp < / P(x)dx 


(14.17) 


►So/re 


dy 

dx 


+ 2 xy = Ax. 


The integrating factor is given immediately by 


p(x) = exp 


[/' 2x dx | 


= exp x 


Multiplying through the ODE by p(x) = expx 2 and integrating, we have 

y exp x 2 = 4 J x exp x 2 dx = 2 exp x 2 + c. 

The solution to the ODE is therefore given by y = 2 + cex p(—x 2 ). ◄ 

Solution method. Rearrange the equation into the form (14.14) and multiply by the 
integrating factor p(x) given by (14.17). The left- and right-hand sides can then be 
integrated directly, giving y from (14.16). 


14.2.5 Homogeneous equations 


Homogeneous equation are ODEs that may be written in the form 

dy_ = A(x, y) = p /y\ 
dx B(x,y) \xJ ’ 


(14.18) 


where A(x,y) and B(x,y ) are homogeneous functions of the same degree. A 
function f{x,y) is homogeneous of degree n if, for any f it obeys 


f(2.x,ky) = k n f(x,y). 

For example, if A — x 2 y — xy 2 and B = x 3 + y 3 then we see that A and B are 
both homogeneous functions of degree 3. In general, for functions of the form of 
A and B , we see that for both to be homogeneous, and of the same degree, we 
require the sum of the powers in x and y in each term of A and B to be the same 
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(in this example equal to 3). The RHS of a homogeneous ODE can be written as 
a function of y/x. The equation may then be solved by making the substitution 
y = vx, so that 


dy 

dx 


dv 

dx 


= F(v). 


This is now a separable equation and can be integrated directly to give 


dv I" dx 

F(v)—v J x ' 


(14.19) 



Substituting y = vx we obtain 


dv 

v + x— = v + tanr. 
dx 


Cancelling v on both sides, rearranging and integrating gives 

r i dx , 
cotr dv = / — = In x + ci. 


J cot v dv = J 


But 


f , f co: 
/ cot v dv = / — 
J J sir 


COS V 

cot v dv = I — — dv = ln(sinr) + a, 
' stnr 


so the solution to the ODE is y = x sin 1 Ax, where A is a constant. ◄ 


Solution method. Check to see whether the equation is homogeneous. If so, make 
the substitution y — vx, separate variables as in ( 14.19) and then integrate directly. 
Finally replace v by y/x to obtain the solution. 


14.2.6 Isobaric equations 

An isobaric ODE is a generalisation of the homogeneous ODE discussed in the 
previous section, and is of the form 


df = A (x,y) 
dx B(x,y )’ 


(14.20) 


where the equation is dimensionally consistent if y and dy are each given a weight 
m relative to x and dx, i.e. if the substitution y = vx m makes it separable. 
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►So/re 


fx ~ 2yx (/ + x) ■ 


Rearranging we have 

^ ' y 2 + dx + 2 yx dy = 0. 

Giving y and dy the weight m and x and dx the weight 1, the sums of the powers in each 
term on the LHS are 2m + 1,0 and 2m + 1 respectively. These are equal if 2m +1=0, i.e. 
if m = — f. Substituting y = vx m = rx~ 1/2 , with the result that dy = x~ 1/2 d v — \ vx~ 3/2 dx , 
we obtain 

dx 

v dv H =0, 

x 

which is separable and may be integrated directly to give \v 2 +lnx = c. Replacing v by 
y^fx we obtain the solution \y 2 x + lnx = c. ◄ 

Solution method. Write the equation in the form A dx + B dy = 0. Giving y and 
dy each a weight m and x and dx each a weight 1, write down the sum of powers 
in each term. Then, if a value of m that makes all these sums equal can be found, 
substitute y = vx" 1 into the original equation to make it separable. Integrate the 
separated equation directly, and then replace v by yx~ m to obtain the solution. 


14.2.7 Bernoulli’s equation 
Bernoulli’s equation has the form 

— + P (x)y = Q{x)y n where n f 0 or 1. (14.21) 

dx 

This equation is very similar in form to the linear equation (14.14), but is in fact 

non-linear due to the extra y n factor on the RHS. However, the equation can be 

made linear by substituting v = y 1_ ", and correspondingly 

dy f y n \ dv 

dx \1 — n J dx 

Substituting this into (14.21) and dividing through by y n , we find 
dv 

— + (1 - n)P(x)v = (1 - n)Q(x), 

which is a linear equation, and may be solved by the method described in 
subsection 14.2.4. 
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►So/ye 


dy + y 

dx x 


2x 3 /. 


If we let v = y 1 4 = y 3 then 

dy y 4 dv 

dx 3 dx 

Substituting this into the ODE and rearranging, we obtain 


dv 3i> 
dx x 


-6.x 3 , 


which is linear and may be solved by multiplying through by the integrating factor (see 
subsection 14.2.4) 


exp 



exp(— 3 In x) 


1 


x 


3 ' 


This yields the solution 


v 

— r = —6.X + C. 
X s 


Remembering that v = y 3 , we obtain y 3 = —6.x 4 + cx 3 . ◄ 


Solution method. Rearrange the equation into the form ( 14.21 ) and make the sub- 
stitution v = y 1 ”". This leads to a linear equation in v, which can be solved by the 
method of subsection 14.2.4. Then replace v by j; 1- " to obtain the solution. 


14.2.8 Miscellaneous equations 

There are two further types of first-degree first-order equation that occur fairly 
regularly but do not fall into any of the above categories. They may be reduced 
to one of the above equations, however, by a suitable change of variable. 

Firstly, we consider 

A- = F(ax + by + c), (14.22) 

dx 

where a, b and c are constants, i.e. x and y only appear on the RHS in the particular 
combination ax + by + c and not in any other combination or by themselves. This 
equation can be solved by making the substitution v = ax + by + c, in which case 

— = a + b-f- = a + bF(v), (14.23) 

dx dx 

which is separable and may be integrated directly. 
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►So/re 


dy 

dx 


= (x + y + l) 2 . 


Making the substitution v = x + y + 1, we obtain, as in (14.23), 

dv , 

— =v 2 + l, 
ax 

which is separable and integrates directly to give 


/if?"/* 


tan v = x + c i. 


So the solution to the original ODE is tan { (x + y + 1 ) = x + ci, where c\ is a constant of 
integration. ◄ 


Solution method. In an equation such as ( 14.22 ), substitute v = ax+by+c to obtain 
a separable equation that can be integrated directly. Then replace v by ax + by + c 
to obtain the solution. 


Secondly, we discuss 


dy 

dx 


ax + by + c 


ex ■ 


(14.24) 


-fy + g’ 

where a, b, c, e, f and g are all constants. This equation may be solved by letting 
x = X + a and y = Y + /?, where a and (’> are constants found from 


act + bp + c = 0 
ea + fP + g = 0 

Then (14.24) can be written as 

dY _aX + bY 
dX ~ eX+fY ’ 

which is homogeneous and can be solved by the method of subsection 14.2.5. 
Note, however, that if a/e = b/f then (14.25) and (14.26) are not independent 
and so cannot be solved uniquely for a and / 5 . However, in this case, (14.24) 
reduces to an equation of the form ( 14.22), which was discussed above. 


(14.25) 

(14.26) 


►So/re 


dy 2x — Sy + 3 
dx 2.x + 4y — 6 


Let x = X + a and y = Y + /?, where a and /? obey the relations 

2a — 5/1 + 3 = 0 
2a + 4j8 - 6 = 0, 

which solve to give a = /? = 1. Making these substitutions we find 

dY _2X~5Y 
dX ~ 2X + 4Y’ 
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which is a homogeneous ODE and can be solved by substituting Y = vX (see subsec- 
tion 14.2.5) to obtain 

dv 2 — Iv — 4v 2 
dX ~ X(2 + 4v) 

This equation is separable, and using partial fractions we find 


/ 


2 + 4v 
2-lv- 4v 2 



which integrates to give 

In X + i ln(4w - 1) + | ln(u + 2) = ci, 


or 

X 3 (4v — l)(v + 2) 2 = exp(3ci). 

Remembering that Y = vX, x = X + 1 and y = Y + 1, the solution to the original ODE 
is given by (4 y — x — 3 )(y + 2x — 3) 2 = a, where C 2 = exp(3ci ). ◄ 


Solution method. If in (14.24) ale f b/f then make the substitution x = X + a, 
y = Y +P, where a and [1 are given by ( 14.25 ) and (14.26); the resulting equation 
is homogeneous and can be solved as in subsection 14.2.5. Substitute v — Y /X, 
X = x — a, and Y = y — /? to obtain the solution. If a/e = b/f then ( 14.24 ) is of 
the same form as (14.22) and may be solved accordingly. 


14.3 Higher-degree first-order equations 

First-order equations of degree higher than the first do not occur often in 
the description of physical systems, since squared and higher powers of first- 
order derivatives usually arise from resistive or driving mechanisms, when an 
acceleration or other higher-order derivative is also present. They do sometimes 
appear in connection with geometrical problems, however. 

Higher-degree first-order equations can be written as F(x,y,dy/dx) = 0. The 
most general standard form is 

P n + a n ~i{x, ylp''^ 1 -\ F afx,y)p + a 0 (x, y) = 0, (14.27) 

where for ease of notation we write p = dy/dx. If the equation can be solved for 
one of x, y or p then either an explicit or a parametric solution can sometimes be 
obtained. We discuss the main types of such equations below, including Clairaut’s 
equation, which is a special case of an equation explicitly soluble for y. 


14.3.1 Equations soluble for p 

Sometimes the LHS of (14.27) can be factorised into the form 


(p - Fi)(p — F 2 ) ■ ■ ■ (p — F n ) = 0, (14.28) 
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where F t = Fj(x,y). We are then left with solving the n first-degree equations 
p = Fj(x, y). Writing the solutions to these first-degree equations as G,(x, y) = 0, 
the general solution to (14.28) is given by the product 



Gi(x,y)G 2 {x,y) ■ ■ ■ G n (x,y) = 0. 

(14.29) 

►So/re 

(x 3 + x 2 + x + l)p 2 — (3x 2 + 2x + 1 )yp + 2xy 2 = 0. 

(14.30) 


This equation may be factorised to give 

[(x + 1 )p - y] [(-x 2 + 1 )p- 2 xy] = 0. 

Taking each bracket in turn we have 

(x + 1)d I- y = °' 

( x 2 + l)~r — 2 xy = 0, 
ax 

which have the solutions y ■— c(x + 1) = 0 and y — c(x 2 + 1) = 0 respectively (see 
section 14.2 on first-degree first-order equations). Note that the arbitrary constants in 
these two solutions can be taken to be the same, since only one is required for a first-order 
equation. The general solution to (14.30) is then given by 

[y — c(x + 1 )] [y — c(x 2 + 1 )] =0. ◄ 


Solution method. If the equation can be factorised into the form ( 14.28 ) then solve 
the first-order ODE p — F, =0 in each factor and write the solution in the form 
Gj(x,y) = 0. The solution to the original equation is then given by the product 
(14.29). 


14.3.2 Equations soluble for x 

Equations that can be solved for x, i.e. such that they may be written in the form 

x = F(y,p), (14.31) 

can be reduced to first-degree first-order equations in p by differentiating both 
sides with respect to y, so that 

dx 1 dF dF dp 
dy p dy dp dy' 

This results in an equation of the form G(y,p ) = 0, which can be used together 
with (14.31) to eliminate p and give the general solution. Note that often a singular 
solution to the equation will be found at the same time (see the introduction to 
this chapter). 


487 




FIRST-ORDER ORDINARY DIFFERENTIAL EQUATIONS 


►So/re 




6 y 2 p 2 + 3 xp — y = 0. 

(14.32) 


This equation can be solved for x explicitly to give 3x = (y/p) — 6y 2 p. Differentiating both 
sides with respect to y, we find 


3 2 

dy p 


1 

P 


y dp ^ 2 dp 
p 2 dy ' dy 


- 12 yp. 


which factorises to give 


(1+6V) ( 2 P + ^|) =«■ 


(14.33) 


Setting the factor containing dp/dy equal to zero gives a first-degree first-order equation 
in p, which may be solved to give py 2 = c. Substituting for p in (14.32) then yields the 
general solution of (14.32): 

y 3 = 3cx + 6c 2 . (14.34) 


If we now consider the first factor in (14.33), we find 6 p 2 y = — 1 as a possible solution. 
Substituting for p in (14.32) we find the singular solution 

8 y 3 + 3x 2 = 0. 


Note that the singular solution contains no arbitrary constants and cannot be found from 
the general solution (14.34) by any choice of the constant c. -4 


Solution method. Write the equation in the form (14.31) and differentiate both 
sides with respect to y. Rearrange the resulting equation into the form G(y,p) = 0, 
which can be used together with the original ODE to eliminate p and so give the 
general solution. If G(y,p) can be factorised then the factor containing dp/dy should 
be used to eliminate p and give the general solution. Using the other factors in this 
fashion will instead lead to singular solutions. 


14.3.3 Equations soluble for y 

Equations that can be solved for y, i.e. are such that they may be written in the 
form 


y = F(x,p), (14.35) 

can be reduced to first-degree first-order equations in p by differentiating both 
sides with respect to x, so that 

dy _ _ dF 8F dp 
dx P 8x dp dx 

This results in an equation of the form G(x,p) = 0, which can be used together 
with (14.35) to eliminate p and give the general solution. An additional (singular) 
solution to the equation is also often found. 
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►So/re 




xp 2 + 2 xp — y = 0. 

(14.36) 


This equation can be solved for y explicitly to give y = xp 2 + 2 xp. Differentiating both 
sides with respect to x, we find 


dy « dp 2 , - d P . ~ 
— = p = 2 xp— + p + 2x— + 2 p, 
ax ax ax 


which after factorising gives 


(P + 1) 


^p + 2x 


dp\ 
dx J 


= 0 . 


(14.37) 


To obtain the general solution of (14.36), we consider the factor containing dp/dx. This 
first-degree first-order equation in p has the solution xp 2 = c (see subsection 14.3.1), which 
we then use to eliminate p from (14.36). Thus we find that the general solution to (14.36) 
is 


(y — c) 2 = 4cx. (14.38) 

If instead, we set the other factor in (14.37) equal to zero, we obtain the very simple 
solution p = —1. Substituting this into (14.36) then gives 

x + y = 0, 

which is a singular solution to ( 14.36). ◄ 


Solution method. Write the equation in the form (14.35) and differentiate both 
sides with respect to x. Rearrange the resulting equation into the form G(x,p ) = 0, 
which can be used together with the original ODE to eliminate p and so give the 
general solution. If G(x,p) can be factorised then the factor containing dp/dx should 
be used to eliminate p and give the general solution. Using the other factors in this 
fashion will instead lead to singular solutions. 


14.3.4 Clairaut’s equation 

Finally, we consider Clairaut’s equation, which has the form 


y = P x + Hp) 


(14.39) 


and is therefore a special case of equations soluble for y, as in (14.35). It may be 
solved by a similar method to that given in subsection 14.3.3, but for Clairaut’s 
equation the form of the general solution is particularly simple. Differentiating 
(14.39) with respect to x, we find 


dy 

dx 


= P = P + x 


dp 

dx 


dF dp 
dp dx 


dp ( dF \ 

dx{lp +X )-°- 


(14.40) 


Considering first the factor containing dp/dx, we find 


dp d 2 y 
dx dx 2 


y = cix + C 2 . 


(14.41) 
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Since p = dy/dx = ci, if we substitute (14.41) into (14.39) we find c\x + ci = 
c\x + F(c\). Therefore the constant C 2 is given by F(c i), and the general solution 
to (14.39) is 

y = ct-x + T(ci), 

(14.42) 

i.e. the general solution to Clairaut’s equation can be obtained by replacing p 
in the ODE by the arbitrary constant ci. Now, considering the second factor in 
(14.40), we also have 

dF 

— + x — 0, 
dp 

(14.43) 

which has the form G(x,p) = 0. This relation may 
(14.39) to give a singular solution. 

be used to eliminate p from 

►So/re 


y = px + p 2 . 

(14.44) 


From (14.42) the general solution is y = cx + c 2 . But from (14.43) we also have 2 p + x = 
0 => p — —x/2. Substituting this into (14.44) we find the singular solution x 2 + 4y = 0. ◄ 


Solution method. Write the equation in the form (14.39), then the general solution 
is given by replacing p by some constant c, as shown in ( 14.42 ). Using the relation 
clF / dp + x = 0 to eliminate p from the original equation yields the singular solution. 


14.4 Exercises 

14.1 A radioactive isotope decays in such a way that the number of atoms present at 
a given time, N(t), obeys the equation 


If there are initially N 0 atoms present, find N{t) at later times. 

14.2 Solve the following equations by separation of the variables : 

(a) y' — xy 3 =0; 

(b) y’ tan -1 x — y(l + x 2 ) -1 = 0; 

(c) x 2 / + xy 2 = 4y 2 . 

14.3 Show that the following equations are either exact or can be made exact, and 
solve them: 

(a) y(2x 2 y 2 + l)y' + x(y 4 + l) = 0; 

(b) 2xy' + 3x + y = 0; 

(c) (cos 2 x + y sin 2x)y' + y 2 = 0. 

14.4 Find the values of a and /? that make 

F(x,y) = ^,^_ 7 + ^ dx + (xy 11 + 1) dy 
an exact differential. For these values solve F(x,y) = 0. 
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14.5 


14.6 


14.7 


14.8 


14.9 


14.10 


14.11 

14.12 


By finding a suitable integrating factor, solve the following equations : 

(a) ( 1 — x 2 )y' + 2 xy = (1 — x 2 ) 3/2 ; 

(b) y' — y cot x + cosec x = 0; 

(c) (x + y 3 )/ = y (treat y as the independent variable). 

By finding an appropriate integrating factor, solve 

dy 2x 2 + y 1 + x 
dx xy 

Find, in the form of an integral, the solution of the equation 

+y = m 

for a general function f(t). Find the specific solutions for 
(a) f(t) = H(t), 

(bi m = m 

(c) /(f) = /? l e 1/11 H(t ) with / < a. 

For case (c), what happens if / — > 0? 

An electric circuit contains a resistance R and a capacitor C in series, and a 
battery supplying a time-varying electromotive force V(t). The charge q on the 
capacitor therefore obeys the equation 

4 + E - v,n 

Assuming that initially there is no charge on the capacitor, and given that 
V(t ) = Vo sin cot, find the charge on the capacitor as a function of time. 

Using tangential-polar coordinates (see exercise 2.20), consider a particle of mass 
m moving under the influence of a force / directed towards the origin O. By 
resolving forces along the instantaneous tangent and normal and making use of 
the result of exercise 2.20 for the instantaneous radius of curvature, prove that 

, dv t dr 

j = —mv— and mv = jp—. 

dr dp 

Show further that h = mpv is a constant of the motion and that the law of force 
can be deduced from 

_ h 2 dp 
p 3 dr 

Use the result of the previous exercise to find the law of force, acting towards 
the origin, under which a particle must move so as to describe the following 
trajectories: 

(a) A circle of radius a which passes through the origin; 

(b) An equiangular spiral, which is defined by the property that the angle a 
between the tangent and the radius vector is constant along the curve. 

Solve 

dv 

(y ~ x )i~ + 2x + 3y = 0. 
dx 

A mass m is accelerated by a time-varying force exp(— /f)y 3 , where v is its velocity. 
It also experiences a resistive force rjv, where q is a constant, owing to its motion 
through the air. The equation of motion of the mass is therefore 

dv , 

m— = exp(— pt)v — qv. 


491 



FIRST-ORDER ORDINARY DIFFERENTIAL EQUATIONS 


Find an expression for the velocity v of the mass as a function of time, given 
that it has an initial velocity r 0 - 

14.13 Using the results about Laplace transforms given in chapter 13 for df/dt and 
f/(f), show, for a function y(t) that satisfies 

f^ + (t-Dy = 0 (*) 

with v(0) finite, that y(s) = C(1 + s)~ 2 for some constant C. 


14.14 

14.15 

14.16 

14.17 

14.18 


14.19 

14.20 


Given that 

00 

y(t) = t + ^2 a nt n , 

n=2 


determine C and show that a n = (— 1 ) n /n\. Compare this result with that obtained 
by integrating (*) directly. 

Solve 

dy 1 

dx x + 2y + 1 


Solve 


dy x + y 

dx 3.x + 3y — 4 


If u = 1 + tany, calculate d(\nu)/dy; hence find the general solution of 

dy , 

— = tan x cosy (cosy + stny). 

Solve 

x(l — 2 x 2 y)^~ + v = 3 x 2 y 2 , 
dx 

given that y(l) = 1/2. 

A reflecting mirror is made in the shape of the surface of revolution generated by 
revolving the curve y(x) about the .x-axis. In order that light rays emitted from a 
point source at the origin are reflected back parallel to the x-axis, the curve y(x) 
must obey 

y = 2 P 

x 1 — p 2 ’ 


where p = dy /dx. By solving this equation for x find the curve y(.x). 

Find the curve such that at each point on it the sum of the intercepts on the x- 
and y-axes of the tangent to the curve (taking account of sign) is equal to 1. 
Find a parametric solution of 


x 




-y = o 


as follows. 


(a) Write an equation for y in terms of p = dy /dx and show that 

p = p 2 + {2px + l)^_. 

(b) Using p as the independent variable, arrange this as a linear first-order 
equation for x. 
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14.21 


14.22 


14.23 


14.24 


14.25 


14.26 


14.27 


(c) Find an appropriate integrating factor to obtain 

In p — p + c 
X= (1 -P) 2 ’ 

which, together with the expression for y obtained in (a), gives a parameter- 
isation of the solution. 

(d) Reverse the roles of x and y in steps (a) to (c), putting dx/dy = p~ l , and 
show that essentially the same parameterisation is obtained. 

Using the substitutions u = x 2 and v = y 2 , reduce the equation 

xy (j~) ~ (x 2 + y 2 ~ + xy = 0 

\dx J dx 

to Clairaut’s form. Hence show that the equation represents a family of conics 
and the four sides of a square. 

The action of the control mechanism on a particular system for an input /(f) is 
described, for t > 0, by the coupled first-order equations: 

y + 4z = f(t), 
z - 2z = y + \y. 

Use Laplace transforms to find the response y(t) of the system to a unit step 
input f(t ) = H(t), given that y(0) = 1 and z(0) = 0. 


Questions 23 to 31 are intended to give the reader practice in choosing an ap- 
propriate method. The level of difficulty varies within the set; if necessary, the hints 
may be consulted for an indication of the most appropriate approach. 


Find the general solutions of the following: 

xy 


(a) dx + " 2 -4- v2 


.. d y _ 4 y 2 2 

a 2 +x 2 X ’ {b dx x 2 y ' 


Solve the following first-order equations for the boundary conditions given: 


(a) y' — (y/x) = 1, y(l) = -l; 

(b) y' — y tanx = 1, y(7t/4) = 3; 

(c) y' - y 2 /x 2 = 1/4, v(l) = 1; 

(d) y' — y 2 /x 2 = 1/4, y(l) = 1/2. 


An electronic system has two inputs, to each of which a constant unit signal is 
applied, but starting at different times. The equations governing the system thus 
take the form 


Jc + 2 y = H(t), 
y — 2x = H(t — 3). 

Initially (at t = 0), x = 1 and y = 0; find x(t) at later times. 
Solve the differential equation 

d y 

sin x - — f 2v cos x = 1 
dx 

subject to the boundary condition y(n/2) = 1. 

Find the complete solution of 

( _ y_dy_ + £ = o 

\ dx J x dx x 

where A is a positive constant. 
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14.28 

14.29 


14.30 


14.31 


Find the solution of 


(5x + y - 7) 


dy 

dx 


3 (x + y + 1). 


Find the solution y = y(x) of 


dy 

x-7-+y- 

dx 




subject to y(l) = 1. 
Find the solution of 


dy 

(2siny — x)-r- = tan y, 
dx 


if (a) y(0) = 0, and (b) y(0) = n/2. 

Find the family of solutions of 

d 2 y ( d/\ 2 dy_ = Q 

dx 2 \dx J dx 

that satisfy y(0) = 0. 


14.5 Hints and answers 

14.1 N(t) = N 0 exp(—fo). 

14.2 (a) y = +(c — x 2 )~^ 2 ; (b) y = ctan~'(x); (c) y = (lnx + 4.x -1 — c) _1 . 

14.3 (a) exact, x 2 y 4 + x 2 + y 2 = c; (b) IF = x~ 1/2 , x 1/2 (x + y) = c; (c) IF = 

sec 2 x, y 2 tan x + y = c. 

14.4 a = —l,P = —2; (l/,y2)tan _1 (x/^/2) — (x/y) + y = c. 

14.5 (a) IF = (1 — x 2 U 2 , y = (1 — x 2 ){k + sin -1 x); (b) IF = cosec x, leading to 

y = k sinx + cosx; (c) exact equation is y~ l (dx / dy) — xy~ 2 = y, leading to 
x = y(k + y 2 / 2). 

14.6 Integrating factor is x; 3x 4 + 2x 3 + 3x 2 y 2 = c. 

14.7 y(t) = e~ t/ *f t or 1 e , ' /c ‘f(t')dt'; (a) y(f) = 1 — e~ t/a ; (b) y(t) = or l e~ tla ; (c) y(t) = 

(e~ t/a _ e -t/Py( a _ p-j it becomes case (b). 

14.8 q(t) = CV o[l + (raCR) 2 ] -1 {sinorf + CRco[exp( — t/RC) — cosraf]}. 

14.9 If the angle between the tangent and the radius vector is a, note that cos a = dr/ds 
and sin a = p/r. 

14.10 (a) r 2 = 2 ap,f oc ar~ 5 ; (b) p = r sin a, / oc (sin aU 2 r~ 3 . 

14.11 Homogeneous equation, put y = vx to obtain (1 — v)(v 2 + 2v + 2R 1 dv = x _1 dx; 

write 1 — v as 2 — (1 + v), and v 2 + 2v + 2 as 1 + (1 + v) 2 ; 

A[x 2 + (x + y) 2 ] = exp {4tan~‘ [(x + y)/x]}. 

14.12 Bernoulli's equation; set v = u~ 1/2 to obtain mdu/dt — 2rju = — 2exp(— /if); 
y~ 2 = 2 (mfl + 2p)~ 1 [exp(— fit) — exp(2 rjt/m)] + v^ 2 exp(2;/f/m). 

14.13 (1 + s)(dy/ds) + 2y = 0. C = 1; y(t) = te~'. 

14.14 Follow subsection 14.2.8; k + y = ln(x + 2y + 3). 

14.15 Equation is of the form of (14.22), set v = x + y; x + 3y + 21n(.x + y — 2) = A. 

14.16 y = tail 1 (/c sccx — 1 ). 

14.17 Equation is isobaric with weight y = — 2; setting y = vx~ 2 gives 
r _1 (l — f ) 1 ( 1 — 2v) dv = x~ l dx; 4xy(l — x 2 y) = i. 

14.18 Eliminate y to obtain, in turn, p(p 2 — 1) = 2x(dp j dx); p = +(1 — Ax)~ l/1 ; 

A 2 y 2 = 4(1 — Ax), i.e. a parabola. 

14.19 The curve must satisfy y = (1— p _1 ) _1 (l— x+px), which has solution x = (p — 1 ) 2 , 
leading to y = (1 + Jx) 2 or x = (1 + jy) 2 ; the singular solution p' = 0 gives 
straight lines joining (0,0) and (0, 1—0) for any 0. 
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14.20 (a) y = p 2 x + p; (d) the constants of integration will differ in the two cases. 

14.21 v = qu + q/(q — 1), where q = dv/du. General solution y 1 = cx 1 + c/(c — 1 ), 

hyperbolae for c > 0 and ellipses for c < 0. Singular solution y = +(x + 1). 

14.22 y(s 2 + 2s + 2) = s(f + 1) + (2 — 2/); y(t) = — 1 + e~ f (2cos t + 3 sin f ). 

14.23 (a) Integrating factor is (a 2 + x 2 ) 1/2 , y = (a 2 + x 2 )/3+A(u 2 +x 2 )~ 1/2 ; (b) separable, 
y = x(x 2 + Ax + 4) _1 . 

14.24 (a) y = xlnx — x; (b) y = tanx + A2secx; (c) homogeneous, y = x(2 — lnx) -1 + 
x/2; 

(d) singular solution y = x/2. 

14.25 Use Laplace transforms; xs(s 2 + 4) = s + s 2 — 2e~ 3s ; 

x(f) = 1 sin2f + cos2f — \H(t — 3) + \ cos(2 1 — 6 )H(t — 3). 

14.26 Integrating factor is sinx; y = (1 + cosx) -1 . 

14.27 This is Clairaut's equation with F(p) = A/p. General solution y = cx-\-A/c\ 
singular solution, y = l^fAx. 

14.28 Follow the second method demonstrated in subsection 14.2.8; x = X + 2, y = 
Y -3; 

X(dv/dX ) = (3 - 2v - y 2 )/(5 + v); (x-y- 5) 3 = A(3x + y - 3). 

14.29 Either Bernoulli’s equation with n = 2 or an isobaric equation with m = 3/2; 
y(x) = 5x 3/2 /(2 + 3x 5/2 ). 

14.30 Treat y as the independent variable, giving the general solution xsiny = 
— (cos2y)/2 + k. (a) y = sin -1 x; (b) x = — cosy cot y. 

14.31 Show that p = ( Ce x — 1) _1 , where p = dy/dx; y = ln[C — e~ x )/(C — 1)] or 
ln[D — (D — 1 )e x ] or ln(e~ A + 1 — e~ x ) +K. 
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15 


Higher-order ordinary differential 

equations 


Following on from the discussion of first-order ordinary differential equations 
(ODEs) given in the previous chapter, we now examine equations of second and 
higher order. Since a brief outline of the general properties of ODEs and their 
solutions was given at the beginning of the previous chapter, we will not repeat 
it here. Instead, we will begin with a discussion of various types of higher-order 
equation. This chapter is divided into three main parts. We first discuss linear 
equations with constant coefficients and then investigate linear equations with 
variable coefficients. Finally, we discuss a few methods that may be of use in 
solving general linear or non-linear ODEs. Let us start by considering some 
general points relating to all linear ODEs. 

Linear equations are of paramount importance in the description of physical 
processes. Moreover, it is an empirical fact that, when put into mathematical 
form, many natural processes appear as higher-order linear ODEs, most often 
as second-order equations. Although we could restrict our attention to these 
second-order equations, the generalisation to nth-order equations requires little 
extra work, and so we will consider this more general case. 

A linear ODE of general order n has the form 

d n v d n 1 v d v 

a n (x)— + a„_i(x) H f + a 0 (x)y = f(x). (15.1) 

If f(x ) = 0 then the equation is called homogeneous; otherwise it is inhomogeneous. 
The first-order linear equation studied in subsection 14.2.4 is a special case of 
(15.1). As discussed at the beginning of the previous chapter, the general solution 
to (15.1) will contain n arbitrary constants, which may be determined if n boundary 
conditions are also provided. 

In order to solve any equation of the form (15.1), we must first find the 
general solution of the complementary equation, i.e. the equation formed by setting 
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f(x) = 0: 

d n y d n ^ y dy 

a "^dx n + an - lix) dx^ H ai ^dx + a °^ y = ( i5 - 2 ) 

To determine the general solution of (15.2), we must find n linearly independent 
functions that satisfy it. Once we have found these solutions, the general solution 
is given by a linear superposition of these n functions. In other words, if the n 
solutions of (15.2) are y\ (x), y 2 (x), . . . , y n (x), then the general solution is given by 
the linear superposition 

y c (x) = ay i(x) + c 2 y 2 (x) H + c„y„(x), (15.3) 

where the c m are arbitrary constants that may be determined if n boundary 
conditions are provided. The linear combination y c (x) is called the complementary 
function of (15.1). 

The question naturally arises how we establish that any n individual solutions to 
(15.2) are indeed linearly independent. For n functions to be linearly independent 
over an interval, there must not exist any set of constants ci, c 2 ,...,c n such that 


G Vi(x) + c 2 y 2 {x) H b c n y„(x) = 0 (15.4) 


over the interval in question, except for the trivial case c\ = c 2 = ■ ■ ■ = c n = 0. 

A statement equivalent to (15.4), which is perhaps more useful for the practical 
determination of linear independence, can be found by repeatedly differentiating 

(15.4), n — 1 times in all, to obtain n simultaneous equations for ci, c 2 ,...,c n : 


ciTiW + c 2 y 2 (x) 4 1- c„y n (x) = 0 

ciyi'(x) + c 2 y 2 '(x) 4 b c n y n '(x) = 0 

(15.5) 


ciTi" l) ( x )~ yc 2 y 2 ' 11 + • • • + c„yj, n 11 (x) — 0, 

where the primes denote differentiation with respect to x. Referring to the 
discussion of simultaneous linear equations given in chapter 8, if the determinant 
of the coefficients of ci, c 2 ,...,c„ is non-zero then the only solution to equations 

(15.5) is the trivial solution c\ = c 2 = ■ ■ ■ = c n = 0. In other words, the n functions 
Ti(x), y 2 (x ), . . . , y n (x) are linearly independent over an interval if 


W(yuy 2 ,---,y n ) = 


y i y 2 

y\ yi 


y ( r X) y ( r 1] 




(15.6) 


over that interval; W{y\,y 2 , ... ,y n ) is called the Wronskian of the set of functions. 
It should be noted, however, that the vanishing of the Wronskian does not 
guarantee that the functions are linearly dependent. 
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If the original equation (15.1) has f(x) = 0 (i.e. it is homogeneous) then of 
course the complementary function y c (x) in (15.3) is already the general solution. 
If, however, the equation has f(x) =f= 0 (i.e. it is inhomogeneous) then y c (x) is only 
one part of the solution. The general solution of (15.1) is then given by 

y(x) = y c (x) + v p (x), (15.7) 

where y p (x) is the particular integral, which can be any function that satisfies (15.1) 
directly, provided it is linearly independent of y c (x). It should be emphasised for 
practical purposes that any such function, no matter how simple (or complicated), 
is equally valid in forming the general solution (15.7). 

It is important to realise that the above method for finding the general solution 
to an ODE by superposing particular solutions assumes crucially that the ODE 
is linear. For non-linear equations, discussed in section 15.3, this method cannot 
be used, and indeed it is often impossible to find closed-form solutions to such 
equations. 


15.1 Linear equations with constant coefficients 

If the a m in (15.1) are constants rather than functions of x then we have 


d"y d n 1 y dy 


(15.8) 


Equations of this sort are very common throughout the physical sciences and 
engineering, and the method for their solution falls into two parts as discussed 
in the previous section, i.e. finding the complementary function y c (x) and finding 
the particular integral y P (x). If f(x) = 0 in (15.8) then we do not have to find 
a particular integral, and the complementary function is by itself the general 
solution. 


15.1.1 Finding the complementary function y c (x) 

The complementary function must satisfy 

d"y d n ~ l v dy 

+ an l lh(^ Cli dv a °y = ^ (15-9) 

and contain n arbitrary constants (see equation (15.3)). The standard method 
for finding y c (x) is to try a solution of the form y = Ae lx , substituting this into 
(15.9). After dividing the resulting equation through by Ae lx , we are left with a 
polynomial equation in X of order n; this is the auxiliary equation and reads 

a n X n T a n ^\/ 17 ' T ■ ■ ■ T a\X T gq — 0. (15.10) 
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In general the auxiliary equation has n roots, say X\,X 2 ,...,X n . In certain cases, 
some of these roots may be repeated and some may be complex. The three main 
cases are as follows. 


(i) All roots real and distinct. In this case the n solutions to (15.9) are expX m x 
for m = 1 to n. It is easily shown by calculating the Wronskian (15.6) 
of these functions that if all the ). m are distinct then these solutions are 
linearly independent. We can therefore linearly superpose them, as in 
(15.3), to form the complementary function 

TcM = c\e kx + c 2 e kx + • • ■ + c n e'"‘ x . (15.11) 

(ii) Some roots complex. For the special (but usual) case that all the coefficients 
a m in (15.9) are real, if one of the roots of the auxiliary equation (15.10) 
is complex, say a + if l, then its complex conjugate a — //? is also a root. In 
this case we can write 

Cl e (x+mx + c 2 e (a-i/J)x = e ax (di cos fix + d 2 sin px) 

= Ae ax { }(/* + *), (15.12) 

where A and cj) are arbitrary constants. 

(iii) Some roots repeated. If, for example, Ai occurs k times (k > 1) as a root 
of the auxiliary equation, then we have not found n linearly independent 
solutions of (15.9); formally the Wronskian (15.6) of these solutions, having 
two or more identical columns, is equal to zero. We must therefore find 
k— 1 further solutions that are linearly independent of those already found 
and also of each other. By direct substitution into (15.9) we find that 

Ax v 2_Ax J-liix 

AC 2 A c ^ * * * 9 A c 

are also solutions, and by calculating the Wronskian it is easily shown that 
they, together with the solutions already found, form a linearly independent 
set of n functions. Therefore the complementary function is given by 

Vc(.x) = (ci + c 2 x H 1- c k x k ~ l )e kx + c k+ ie k+lX + c k+2 e k+lX 4 b c n e kx . 

(15.13) 

If more than one root is repeated the above argument is easily extended. 
For example, suppose as before that X\ is a fc-fold root of the auxiliary 
equation and, further, that X 2 is an /-fold root (of course, k > 1 and / > 1). 
Then, from the above argument, the complementary function reads 

y c (x) = (ci + c 2 x H b c k x k ~ l )e kx 

+ {C k +i + C k+2 x + ■ • • + c^.+z x} 1 )e kx 
+ c k+ i+ ie k+,+lX + c k+ i +2 e h+,+2X H b c n e Kx . (15.14) 
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►find the complementary function of the equation 


d 2 y _ 2 d f 
dx 2 dx 


+ y = 


(15.15) 


Setting the RHS to zero, substituting y = Ae lx and dividing through by Ae ix we obtain 
the auxiliary equation 

A 2 - 2/1 + 1 = 0. 

The root 2=1 occurs twice and so, although e x is a solution to (15.15), we must find 
a further solution to the equation that is linearly independent of e x . From the above 
discussion, we deduce that xe x is such a solution, so that the full complementary function 
is given by the linear superposition 

y c (x) = (ci + C 2 x)e x . ◄ 


Solution method. Set the RHS of the ODE to zero (if it is not already so ), and 
substitute y = Ae Ax . After dividing through the resulting equation by Ae Xx , obtain 
an nth-order polynomial equation in X (the auxiliary equation, see (15.10)). Solve 
the auxiliary equation to find the n roots, Xi, X 2 , . . . , X n , say. If all these roots are 
real and distinct then y c (x) is given by (15.11 ). If, however, some of the roots are 
complex or repeated then y c (x) is given by (15.12) or (15.13), or the extension 
(15.14) of the latter, respectively. 


15.1.2 Finding the particular integral y p (x) 

There is no generally applicable method for finding the particular integral y p (x) 
but, for linear ODEs with constant coefficients and a simple RHS, y p (x) can often 
be found by inspection or by assuming a parameterised form similar to f(x). The 
latter method is sometimes called the method of undetermined coefficients. If/(x) 
contains only polynomial, exponential, or sine and cosine terms then, by assuming 
a trial function for y p (x) of similar form but one which contains a number of 
undetermined parameters and substituting this trial function into (15.9), the 
parameters can be found and y p (.x) deduced. Standard trial functions are as 
follows. 

(i) If f(x) = ae rx then try 

y p (.x) = be rx . 

(ii) If f(x) = a\ sinr.x + «2 cos rx (ni or 02 may be zero) then try 

y p (.x) = b 1 sin rx + bj cos rx. 

(iii) If f(x) = « 0 + a i x + ■ ■ ■ + onx n (some a m may be zero) then try 

y P (x) = b 0 + bix H 1- b N x N . 
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(iv) If f(x) is the sum or product of any of the above then try y p (x) as the 
sum or product of the corresponding individual trial functions. 


It should be noted that this method fails if any term in the assumed trial 
function is also contained within the complementary function y c (x). In such a 
case the trial function should be multiplied by the smallest integer power of x 
such that it will then contain no term that already appears in the complementary 
function. The undetermined coefficients in the trial function can now be found 
by substitution into (15.8). 

Three further methods that are useful in finding the particular integral y p (x) 
are Green’s functions, the variation of parameters, and making a change in the 
dependent variable based on knowledge of the complementary function. However, 
since these methods are also applicable to equations with variable coefficients, a 
discussion of them is postponed until section 15.2. 


>-Find a particular integral of the equation 


d 2 y _ 2 dy 
dx 2 dx 


+ y 


From the above discussion our first guess at a trial particular integral would be y p (x) = be x . 
However, since the complementary function of this equation is y c (x) = (ci + C 2 x)e x (as 
in the previous subsection), we see that e x is already contained in it, as indeed is xe x . 
Multiplying our first guess by the lowest integer power of x such that the result does not 
appear in y c (x), we therefore try y p (x) = hx 2 e x . Substituting this into the ODE, we find 
that b = 1/2, so the particular integral is given by y p (x) = x 2 e x /2. ◄ 


Solution method. If the RHS of an ODE contains only the functions mentioned at 
the start of this subsection then the appropriate trial function should be substituted 
into it, thereby fixing the undetermined parameters. If however, the RHS of the 
equation is not of this form then one of the more general methods outlined in sub- 
sections 15.2.3-15.2.5 should be used; perhaps the most straightforward of these is 
the variation-of-parameters method. 


15.1.3 Constructing the general solution v c (x) + >’ p (x) 

As stated earlier, the full solution to the ODE (15.8) is found by adding together 
the complementary function and any particular integral. In order to illustrate 
further the material discussed in the last two subsections, let us find the general 
solution to a new example, starting from the beginning. 
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►So/re 




d 2 y „ n . . 

r + 4 y = x sin 2x. 
ax 1 

(15.16) 


First we set the RHS to zero and assume the trial solution y = Ae Xx . Substituting this into 
(15.16) leads to the auxiliary equation 

A 2 + 4 = 0 => A = ±2 i. (15.17) 

Therefore the complementary function is given by 

y c (x) = cie 1,x + C 2 e~ 1,x = di cos2x + di sin2x. (15.18) 

We must now turn our attention to the particular integral y p (x). Consulting the list of 
standard trial functions in the previous subsection, we find that a first guess at a suitable 
trial function for this case should be 

(ax 2 + bx + c)(d sin2x + ecos2x). (15.19) 

However, we see that this trial function contains terms in sin 2.x and cos 2x, both of which 
already appear in the complementary function (15.18). We must therefore multiply (15.19) 
by the smallest integer power of x that ensures that none of the resulting terms appears 
in _Vc(x). Since multiplying by x will suffice, we finally assume the trial function 

(ax 3 + bx 2 + cx)(d sin2x + ecos2x). (15.20) 

Substituting this into (15.16) to fix the constants appearing in (15.20), we find the particular 
integral to be 

X 3 X 2 X 

y p (x) = — cos2x + — sin2x + — cos2x. (15.21) 

12 16 32 

The general solution to (15.16) then reads 
y(x) = y c (x) + y p (x) 

x 3 \ 2 x 

= d, cos 2x + d-> sin 2x — — cos 2x + : — sin 2x + — cos 2x. ◄ 


15.1.4 Linear recurrence relations 

The discrete analogues of differential equations are called recurrence relations (or 
sometimes difference equations). Whereas a differential equation gives a prescrip- 
tion, in terms of current values, for the new value of an dependent variable at a 
point only infinitesimally far away, a recurrence relation describes how the next 
in a sequence of values n,„ defined only at (non-negative) integer values of the 
‘independent variable’ n, is to be calculated. 

In its most general form a recurrence relation expresses the way in which u n+ \ 
is to be calculated from all the preceding values wo, «i,... ,u„. Just as the most 
general differential equations are intractable, so are the most general recurrence 
relations, and we will limit ourselves to analogues of the types of differential 
equations studied earlier in this chapter, namely those that are linear, have 
constant coefficients and possess simple functions on the RHS. Such equations 
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occur over a broad range of engineering and statistical physics as well as in the 
realms of finance, business planning and gambling ! They form the basis of many 
numerical methods, particularly those concerned with the numerical solution of 
ordinary and partial differential equations. 

A general recurrence relation is exemplified by the formula 

JV — 1 

Uri+1 = E a r u n - r +k, (15.22) 

r = 0 

where N and the a T are fixed and k is a constant or a simple function of n. 
Such an equation, involving terms of the series whose indices differ by up to N 
(ranging from n — N + 1 to n), is called an iVth-order recurrence relation. It is clear 
that, given values for wo, , mjv-i, this is a definitive scheme for generating the 
series and therefore has a unique solution. 

Parallelling the nomenclature of differential equations, if the term not involving 
any u n is absent, i.e. k — 0, then the recurrence relation is called homogeneous. 
The parallel continues with the form of the general solution of (15.22). If v n is 
the general solution of the homogeneous relation, and w„ is any solution of the 
full relation, then 

u„ = v n + w„ 

is the most general solution of the complete recurrence relation. This is straight- 
forwardly verified as follows : 

u n + 1 = U>+1 + W„+l 

JV— 1 N-\ 

= ^ ' a r v n — r T ^ ' a r w„ — ? T k 

r=0 r = 0 

JV— 1 

~ ^ ' a r {Vf] — r -f U'„ — r ) T k 

r= 0 
JV-1 

= ^2 a r u n - r + k. 

r = 0 

Of course, if k = 0 then vv„ = 0 for all n is a trivial particular solution and the 
complementary solution, v n , is itself the most general solution. 

First-order recurrence relations 
First-order relations, for which N = 1, are exemplified by 

u„+ 1 = au n +/c, (15.23) 

with uq specified. The solution to the homogeneous relation is immediate, 

u n = Ca n , 
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and, if k is a constant, the particular solution is equally straightforward : w n = K 
for all n, provided K is chosen to satisfy 


K — aK T k , 


i.e. K = k(l — a) -1 . This will be sufficient unless a = 1, in which case u„ = no + nk 
is obvious by inspection. 

Thus the general solution of (15.23) is 


Ca n +k/(\ — a) a 1, 
no + nk a = 1. 


(15.24) 


If n 0 is specified for the case of a fi 1 then C must be chosen as C = no— k/(l — a), 
resulting in the equivalent form 

1 — a" 

n„ = non" +k- . (15.25) 

1 — a 

We now illustrate this method with a worked example. 


►T house-buyer borrows capital B from a bank that charges a fixed annual rate of interest 
R%. If the loan is to be repaid over Y years, at what value should the fixed annual payments 
P , made at the end of each year, be set? For a loan over 25 years at 6%, what percentage 
of the first year’s payment goes towards paying off the capital? 


Let u„ denote the outstanding debt at the end of year n, and write R/ 100 = r. Then the 
relevant recurrence relation is 

u n +i = u„( 1 +r) — P 


with u o = B. From (15.25) we have 


II 


n 


B(l+r)"-P 


1 — (1 + r)” 
1 — (1 + r) 


As the loan is to be repaid over Y years, u Y = 0 and thus 


Br{ 1 + r) Y 
(1 + r) Y — L 


The first year's interest is rB and so the fraction of the first year's payment going 
towards capital repayment is ( P — rB)/P , which, using the above expression for P, is equal 
to (1 + r)~ Y . With the given figures, this is (only) 23%. ◄ 


With only small modifications, the method just described can be adapted to 
handle recurrence relations in which the constant k in (15.23) is replaced by ka. n , 
i.e. the relation is 


u n + 1 = au„ -\-ka n . (15.26) 

As for an inhomogeneous linear differential equation (see subsection 15.1.2), we 
may try as a potential particular solution a form which resembles the term that 
makes the equation inhomogeneous. Here, the presence of the term ka n indicates 
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that a particular solution of the form u„ = Ha" should be tried. Substituting this 
into (15.26) gives 

Ha" +1 = aAa n + ka n . 


from which it follows that A = k/(a — a) and that there is a particular solution 
having the form u„ = ka ”/( a — a), provided a =f= a. For the special case a = a, the 
reader can readily verify that a particular solution of the form u n = Ana" is appro- 
priate. This mirrors the corresponding situation for linear differential equations 
when the RHS of the differential equation is contained in the complementary 
function of its LF1S. 

In summary, the general solution to (15.26) is 


U 


n 


Cifl" + ka n /(a — a) a a, 
Cba" + km n ~ l a = a, 


(15.27) 


with Ci = mq — k /{ a — a) and C 2 = uq. 


Second-order recurrence relations 

We consider next recurrence relations that involve u„_i in the prescription for 
m, !+ i and treat the general case in which the intervening term, u n , is also present. 
A typical equation is thus 

u n+ i = cm n + bu n _\ + k. (15.28) 

As previously, the general solution of this is u n = v„ + w„, where v n satisfies 

v n+ i = av„ + bv„^ 1 (15.29) 

and w n is any particular solution of (15.28); the proof follows the same lines as 
that given earlier. 

We have already seen for a first-order recurrence relation that the solution to 
the homogeneous equation is given by terms forming a geometric series, and we 
consider a corresponding series of powers in the present case. Setting v n — AA' 1 in 
(15.29) for some A, as yet undetermined, requires that A should satisfy 

HA" +1 = aAA n + bAA n ~\ 

Dividing through by HA" -1 (assumed non-zero) shows that A could be either of 
the roots, Ai and A 2 , of 

A 2 — aA — b = 0, (15.30) 

which is known as the characteristic equation of the recurrence relation. 

That there are two possible series of terms of the form HA" is consistent with the 
fact that two initial values (boundary conditions) have to be provided before the 
series can be calculated by repeated use of (15.28). These two values are sufficient 
to determine the appropriate coefficient H for each of the series. Since (15.29) is 
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both linear and homogeneous, and is satisfied by both v„ = AA" and v„ = BA ", its 
general solution is 

v n = AA\ + BA\. 

If the coefficients a and b are such that (15.30) has two equal roots, i.e. a 2 = —4b, 
then, as in the analogous case of repeated roots for differential equations (see 
subsection 15.1.1(iii)), the second term of the general solution is replaced by BnA" 
to give 

v n — (A Bri)A\. 

Finding a particular solution is straightforward if k is a constant : a trivial but 
adequate solution is w„ = k(l — a — b)~ l for all n. As with first-order equations, 
particular solutions can be found for other simple forms of k by trying functions 
similar to k itself. Thus particular solutions for the cases k = Cn and k = Da' 1 
can be found by trying w„ = E + Fn and w„ = Ga" respectively. 


►Find the value of «i6 if the series u n satisfies 

u„+ 1 + 4 u„ + 3m„_i = n 
for n > 1, with uq = 1 and u i = — 1. 

We first solve the characteristic equation, 

A 2 + 4 A + 3 = 0, 

to obtain the roots A = — 1 and A = —3. Thus the complementary function is 

v n =A(-l)" + B(-3)". 

In view of the form of the RHS of the original relation, we try 

w„ = E + Fn 


as a particular solution and obtain 

E + F(n + 1) + 4(£ + Fn) + 3 [E + F(n — 1)] = n, 

yielding F = 1/8 and E = 1/32. 

Thus the complete general solution is 

u n = A(— 1)" + B(— 3)" + - + — , 

and now using the given values for u o and u\ determines A as 7/8 and B as 3/32. Thus 
u n = yj [28( — 1 )" + 3( — 3)" + 4/7 + 1] . 

Finally, substituting n = 16 gives uu, = 4035 633, a value the reader may (or may not) 
wish to verify by repeated application of the initial recurrence relation. ◄ 
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Higher-order recurrence relations 

It will be apparent that linear recurrence relations of order N > 2 do not present 
any additional difficulty in principle, though two obvious practical difficulties are 
(i) that the characteristic equation is of order N and in general will not have roots 
that can be written in closed form and (ii) that a correspondingly large number 
of given values is required to determine the N otherwise arbitrary constants 
in the solution. The algebraic labour needed to solve the set of simultaneous 
linear equations which determines them increases rapidly with N. We do not give 
specific examples here, but some are included in the exercises at the end of the 
chapter. 


15.1.5 Laplace transform method 

The method of Laplace transforms is very useful for solving linear ODEs with 
constant coefficients. Taking the Laplace transform of such an equation trans- 
forms it into a purely algebraic equation in terms of the Laplace transform 
of the required solution. Once the algebraic equation has been solved for this 
Laplace transform, the general solution to the original ODE can be obtained 
by performing an inverse Laplace transform. One advantage of this method is 
that, for given boundary conditions, it provides the solution in just one step, 
instead of having to find the complementary function and particular integral 
separately. 

In order to apply this method we need only two results from Laplace transform 
theory (see section 13.2). First, the Laplace transform of a function f(x) is defined 

by 


PCO 

f(s) = / e~ sx f(x)dx, (15.31) 

Jo 

from which we can derive a second useful relation. This concerns the Laplace 
transform of derivatives of f(x) : 

W ] (s) = S n f(s) - s n ~ l m - s n ~ 2 f(0) sf n - 2 \ 0) - / l " -1) (0), 

(15.32) 

where the primes and superscripts in parentheses denote differentiation with 
respect to x. Using these relations, along with the table 13.1, on p. 461, which 
gives Laplace transforms of standard functions, we are in a position to solve a 
linear ODE with constant coefficients by this method. 
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►SoLe 


<Py 

dx 2 


3f+2,.2e- 

dx 


subject to the boundary conditions y(0) = 2, y'(0) = 1. 


(15.33) 


Taking the Laplace transform of (15.33) and using the table of standard results we obtain 

s 2 y(s) - sy(0) - y'(0) - 3 [s.vQ) - v(0)] + 2y(s) = — L-, 

s + 1 

which reduces to 

(s 2 — 3s + 2)y(s) — 2s + 5 = — - — . (15.34) 

s + 1 

Solving this algebraic equation for y(s), the Laplace transform of the required solution to 
(15.33), we obtain 


m 


2 s 2 - 3s - 3 
(s + l)(s — l)(s — 2) 


1 2 1 

3(s+ 1) + ~ 3(s — 2)’ 


(15.35) 


where in the final step we have used partial fractions. Taking the inverse Laplace transform 
of (15.35), again using table 13.1, we find the specific solution to (15.33) to be 

y(x) = \e~ x + 2e x - je 2x . ◄ 


Note that if the boundary conditions in a problem are given as symbols, rather 
than just numbers, then the step involving partial fractions can often involve 
a considerable amount of algebra. The Laplace transform method is also very 
convenient for solving sets of simultaneous linear ODEs with constant coefficients. 


► 7wo electrical circuits, both of negligible resistance, each consist of a coil having self- 
inductance L and a capacitor having capacitance C. The mutual inductance of the two 
circuits is M. There is no source of e.mf in either circuit. Initially the second capacitor 
is given a charge CLo. the first capacitor being uncharged, and at time t = 0 a switch in 
the second circuit is closed to complete the circuit. Find the subsequent current in the first 
circuit. 


Subject to the initial conditions gi(0) = <ji(0) = <gr 2 (0) = 0 and </ 2 (0) = CV 0 = V 0 /G, say, 
we have to solve 


Liji -f M‘q 2 T Gqi — 0, 

Afiji + Li/2 T Gq2 = 0. 

On taking the Laplace transform of the above equations, we obtain 

(Ls 2 + G)qi + Ms 2 q 2 = sMVqC, 

Ms 2 q\ + (Ls 2 + G )q 2 = sLVqC. 


Eliminating q 2 and rewriting as an equation for q\, we find 


qi(s) 


MV 0 s 

[(L + M)s 2 + G][(L — M)s 2 + G] 

V 0 [ ( L + M)s (L-M)s 

2 G [( L + M)s 2 + G ~ ( L - M)s 2 + G 
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Using table 13.1, 

qi(t) = \ FoC(cosa>if — cosco 2 f), 

where of(L + M) = G and co\(L — M) = G. Thus the current is given by 
i'i(f) = \VoC(a >2 sinco 2 f — coi sincoR). ◄ 

Solution method. Perform a Laplace transform, as defined in ( 15.31 ), on the entire 
equation, using (15.32) to calculate the transform of the derivatives. Then solve the 
resulting algebraic equation for v(s), the Laplace transform of the required solution 
to the ODE. By using the method of partial fractions and consulting a table of 
Laplace transforms of standard functions, calculate the inverse Laplace transform. 
The resulting function y(x) is the solution of the ODE that obeys the given boundary 
conditions. 


15.2 Linear equations with variable coefficients 

There is no generally applicable method of solving equations with coefficients 
that are functions of x. Nevertheless, there are certain cases in which a solution is 
possible. Some of the methods discussed in this section are also useful in finding 
the general solution or particular integral for equations with constant coefficients 
that have proved impenetrable by the techniques discussed above. 


15.2.1 The Legendre and Euler linear equations 


Legendre’s linear equation has the form 

d n v dv 

a„(ax + P) n -f~ H h «i(ax + p)^- + a 0 v = f(x), (15.36) 

dx n dx 

where a, p and the a n are constants and may be solved by making the substitution 
ax + p = e f . We then have 

dy dt dy ot dy 
dx dx dt ax + p dt 


d 2 y d dy of f d 2 y dy\ 

dx 2 dx dx (ax + P) 2 \ dt 2 dt J 


and so on for higher derivatives. Therefore we can write the terms of (15.36) as 


(a x + P))^ 

(a x + P) 2 ^ 


1 d ( d 
dt \dt 



(15.37) 


to + /j)"? = «4(y-i 
dx n dt V dt 



y- 
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Substituting equations (15.37) into the original equation (15.36), the latter becomes 
a linear ODE with constant coefficients, i.e. 


a n a" 


d 

dt 




y H b 


dy 

dt 


+ aoy = / 



which can be solved by the methods of section 15.1. 

A special case of Legendre’s linear equation, for which a = 1 and /3 == 0, is 
Eider’s equation, 

d"y dy 

a " X dx" ^ fflix^+ao y=/(x); (15.38) 


it may be solved in a similar manner to the above by substituting x = e‘. If 
/(x) = 0 in (15.38), substituting y = x A leads to a simple algebraic equation in 
X, which can be solved to yield the solution to (15.38). In the event that the 
algebraic equation for X has repeated roots, extra care is needed. If X\ is a k - fold 
root (k > 1) then the k linearly independent solutions corresponding to this root 
are x Al , x Al lnx, . ,.,x 4l (lnx)* -1 . 


►So/re 


Hf+4- 4,-0 
ax 1 ax 

(15.39) 

by both of the methods discussed above. 



First we make the substitution x = e', which, after cancelling e' , gives an equation with 
constant coefficients, i.e. 


d 

dt 



, + f-4^0 


d 2 y 

^- 4 .v = °- 


(15.40) 


Using the methods of section 15.1, the general solution of (15.40), and therefore of (15.39), 
is given by 

y = c\e 2t + C 2 e~ 2 ‘ = c ix 2 + C 2 X~ 2 . 

Since the RHS of (15.39) is zero, we can reach the same solution by substituting y = x A 
into (15.39). This gives 

X(X — \)x A + Xx A — 4x a = 0, 


which reduces to 


(X 2 - 4)x a = 0. 

This has the solutions X = +2, so we obtain again the general solution 

y = c\x 2 + C 2 X 2 . ◄ 


Solution method. If the ODE is of the Legendre form (15.36) then substitute a.x + 
/? = e r . This results in an equation of the same order but with constant coefficients, 
which can be solved by the methods of section 15.1. If the ODE is of the Euler 
form (15.38) with a non-zero RHS then substitute x = e r ; this again leads to an 
equation of the same order but with constant coefficients. If, however, f(x) = 0 in 
the Euler equation (15.38) then the equation may also be solved by substituting 
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y = x x . This leads to an algebraic equation whose solution gives the allowed values 
of X; the general solution is then the linear superposition of these functions. 

15.2.2 Exact equations 

Sometimes an ODE may be merely the derivative of another ODE of one order 
lower. If this is the case then the ODE is called exact. The nth-order linear ODE 
d n v dv 

a„(x)-^ H 1- + a 0 {x)y = f(x), (15.41) 

6ty( 6tJv 

is exact if the LHS can be written as a simple derivative, i.e. if 

d n y d d n ~ l y 

a„(x)—-l fa 0 (x)y=— — [■ H \~b 0 (x)y . (15.42) 

It may be shown that, for (15.42) to hold, we require 

ao(x) — a\(x) + a'jix) — • ■ ■ + (— 1 ) n a ( ffx) = 0, (15.43) 

where the prime again denotes differentiation with respect to x. If (15.43) is 
satisfied then straightforward integration leads to a new equation of one order 
lower. If this simpler equation can be solved then a solution to the original 
equation is obtained. Of course, if the above process leads to an equation that is 
itself exact then the analysis can be repeated to reduce the order still further. 



Comparing with (15.41), we have m = 1 — x 2 , ai = —3.x and ao = — 1. It is easily shown 
that a 0 — a\ + a" = 0, so (15.44) is exact and can therefore be written in the form 

bi(x)j^ 4- bo(x)y =1. (15.45) 

Expanding the LHS of (15.45) we find 

T- ( b 'T +fo °- yN ) = bi TT. +(6'i + bo)j~ +b' 0 y. (15.46) 

IIa \ Ua J Wa (Ia 

Comparing (15.44) and (15.46) we find 

£>i = l — x 2 , b\ + £>o = — 3x, b' 0 = —1. 

These relations integrate consistently to give b\ = 1 — x 2 and bo = — x, so (15.44) can be 
written as 

j (1 -x 2 )f-xy =1. (15.47) 

ax i ax 

Integrating (15.47) gives us directly the first-order linear ODE 

dy / x \ x + ci 

dx VI — x 2 ) ^ 1 — x 2 ’ 

which can be solved by the method of subsection 14.2.4 and has the solution 

ci sin -1 x T C 2 
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It is worth noting that, even if a higher-order ODE is not exact in its given form, 
it may sometimes be made exact by multiplying through by some suitable function, 
an integrating factor, cf. subsection 14.2.3. Unfortunately, no straightforward 
method for finding an integrating factor exists and one often has to rely on 
inspection or experience. 


►So/re 

x(l — x 2 )^-^ — 3 x 2< ^-—xy = x. (15.48) 


It is easily shown that (15.48) is not exact, but we also see immediately that by multiplying 
it through by 1/x we recover (15.44), which is exact and is solved above. ◄ 

Another important point is that an ODE need not be linear to be exact, 
although no simple rule such as (15.43) exists if it is not linear. Nevertheless, it is 
often worth exploring the possibility that a non-linear equation is exact, since it 
could then be reduced in order by one and may lead to a soluble equation. This 
is discussed further in subsection 15.3.3. 

Solution method. For a linear ODE of the form (15.41 ) check whether it is exact 
using equation ( 15.43 ). If it is not then attempt to find an integrating factor which 
when multiplying the equation makes it exact. Once the equation is exact write the 
LHS as a derivative as in ( 15.42) and, by expanding this derivative and comparing 
with the LHS of the ODE, determine the functions b m (x) in (15.42). Integrate the 
resulting equation to yield another ODE, of one order lower. This may be solved or 
simplified further if the new ODE is itself exact or can be made so. 


15.2.3 Partially known complementary function 
Suppose we wish to solve the nth-order linear ODE 
d n v dv 

an{x) d^ + ■ " + ai(x) ~L + a ° (x)y = /(x) ’ (15 - 49) 

and we happen to know that u(x) is a solution of (15.49) when the RHS is 
set to zero, i.e. u(x) is one part of the complementary function. By making the 
substitution y(x) = u(x)v(x), we can transform (15.49) into an equation of order 
n — 1 in dv/dx. This simpler equation may prove soluble. 

In particular, if the original equation is of second order then we obtain 
a first-order equation in dv/dx, which may be soluble using the methods of 
section 14.2. In this way both the remaining term in the complementary function 
and the particular integral are found. This method therefore provides a useful 
way of calculating particular integrals for second-order equations with variable 
(or constant) coefficients. 


512 




15.2 LINEAR EQUATIONS WITH VARIABLE COEFFICIENTS 


►So/re 




d 2 y 

— T + y = cosec x. 
dx- 

(15.50) 


We see that the RHS does not fall into any of the categories listed in subsection 15.1.2, 
and so we are at an initial loss as to how to find the particular integral. However, the 
complementary function of (15.50) is 


y c (x) = ci sin x + c 2 cos x. 


and so let us choose the solution u(x) = cosx (we could equally well choose sinx) and 
make the substitution y(x) = v(x)u(x) = i>(x)cosx into (15.50). This gives 


d 2 v . dv 

cos x — 2 sm x — = cosec x, 

ax 1 ax 


(15.51) 


which is a first-order linear ODE in dv/dx and may be solved by multiplying through by 
a suitable integrating factor, as discussed in subsection 14.2.4. Writing (15.51) as 


d 2 v 
dx 2 


dv 

2 tan x — 
dx 


cosec x 
cosx 


we see that the required integrating factor is given by 


(15.52) 


exp 



tan x dx 


} 


exp [2 ln(cos x)] = cos 2 x. 


Multiplying both sides of (15.52) by the integrating factor cos 2 x we obtain 


d 

dx 


dv 

dx 


cot x, 


which integrates to give 


dv 

dx 


ln(sinx) + ci. 


After rearranging and integrating again, this becomes 


I 


v= sec 2 x ln(sin x) dx + c\ / sec 2 x dx 


/■ 


= tan x ln(sin x) — x + ci tan x + C 2 . 


Therefore the general solution to (15.50) is given by y = uv = vcosx, i.e. 

y = Ci sin x + ci cos x + sin x ln(sin x) — x cos x, 
which contains the full complementary function and the particular integral. ◄ 


Solution method. If u(x) is a known solution of the nth-order equation ( 15.49) with 
f(x) = 0, then make the substitution y(x) = u(x)v(x ) in (15.49). This leads to an 
equation of order n — 1 in dv/dx, which might be soluble. 
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15.2.4 Variation of parameters 

The method of variation of parameters proves useful in finding particular integrals 
for linear ODEs with variable (and constant) coefficients. However, it requires 
knowledge of the entire complementary function, not just of one part of it as in 
the previous subsection. 

Suppose we wish to find a particular integral of the equation 
d n v dv 

X ^ + a °^ y = (15.53) 

and the complementary function y c (x) (the general solution of (15.53) with 
fix) = 0) is 

y c {x) = ci.vi(x) + c 2 y 2 {x) H b c n y n (x), 

where the functions y„(x) are known. We now assume that a particular integral of 
(15.53) can be expressed in a form similar to that of the complementary function, 
but with the constants c m replaced by functions of x, i.e. we assume a particular 
integral of the form 

y p (.x) = k\(x)y\{x) + k 2 (x)y 2 (x) H h k„(x)y„(x). (15.54) 

This will no longer satisfy the complementary equation (i.e. (15.53) with the RHS 
set to zero) but might, with suitable choices of the functions kfx), be made equal 
to / (x), thus producing not a complementary function but a particular integral. 

Since we have n arbitrary functions ki(x),k 2 (x), . . . ,k n (x), but only one restric- 
tion on them (namely the ODE), we may impose a further n — 1 constraints. We 
can choose these constraints to be as convenient as possible, and the simplest 
choice is given by 

k[(x)yi(x) + k' 2 (x)y 2 (x) 4 f fc' (x)y„(x) = 0 

k[(x)y[(x) + k 2 (x)y 2 (x) H f k' n (x)y' n {x) = 0 

: (15.55) 

K(x)/”~ 2) (x) + k 2 (x)y { 2 ^ 2 \x) H f k' n (x)y^ 2) (x) = 0 

k' l (x)y ( "^ i \x) + k 2 {x)y { 2 ^ l \x) H f k' n (x)y, ( ," _1) (x) = 

where the primes denote differentiation with respect to x. The last of these 
equations is not a freely chosen constraint but must be satisfied given the previous 
n — 1 constraints and the original ODE. 

This choice of constraints is easily justified (although the algebra is quite 
messy). Differentiating (15.54) with respect to x, we obtain 

y' p = kiy[ + k 2 y 2 + • • • + k„y' n + (k[yi + k 2 y 2 + • • • + k' n y n ), 

where, for the moment, we drop the explicit x-dependence of these functions. Since 
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we are free to choose our constraints as we wish, let us define the expression in 
parentheses to be zero, giving the first equation in (15.55). Differentiating again 
we find 


y'p = hy" + k 2 y 2 + • • ■ + k n y" n + (k[y[ + k’ 2 y' 2 + • • ■ + k! n y' n ). 

Once more we can choose the expression in brackets to be zero, giving the second 
equation in (15.55). We can repeat this procedure, choosing the corresponding 
expression in each case to be zero. This yields the first n — 1 equations in (15.55). 
The mth derivative of y p for m < n is then given by 

y^ ) =k l y ( r ) + k 2 y^ n) + --- + kj t r\ 


Differentiating y v once more we find that its nth derivative is given by 

= k o4"> + k 2 yf + • • • + k n y^ + + ' ' ' + Ky^Y 


Substituting the expressions for Vp m) , m = 0 to n, into the original ODE (15.53), 
we obtain 


^amihy^ +k 2 y 2 n) -1 \- k n yl™ ] ) + a n {k[y { " +k 2 y { 2 11 H f k' n y ( " 1) )=/(.x). 

m= 0 


^ a m ^2 kjy { -' ] + a n {ki'y { " 11 + k{y 2 !) H f k„y ( " !) = f(x). 

m = 0 7=1 

Rearranging the order of summations on the LHS, we find 

n 

22 k Mnyf d + at y’j + aoVj) + a n (k[y ( "~ 1] + k 2 y ( 2 ~ l) H h k' n y ^ _1) ) = f(x). 

l=i (15.56) 


But since the functions yj are solutions of the complementary equation of (15.53) 
we have (for all j) 

a„y j n) H f aiy'j + a 0 yj = 0. 

Therefore (15.56) becomes 

a„(k[y^ 1] + k 2 y ( 2 ^ U H f k' n y { "~ l) ) = f(x). 


which is the final equation given in (15.55). 

Considering (15.55) to be a set of simultaneous equations in the set of unknowns 
k[(x),k 2 , . . ,,k' n (x), we see that the determinant of the coefficients of these functions 
is equal to the Wronskian W(yi,y 2 ,...,y„), which is non-zero since the solutions 
v m (x) are linearly independent; see equation (15.6). Therefore (15.55) can be solved 
for the functions k' m (x), which in turn can be integrated, setting all constants of 
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integration equal to zero, to give k m (x). The general solution to (15.53) is then 
given by 

n 

y(x) = Vc(-x) + V’p(x) = ^> m + k m (x)]y m (x). 

m = 1 

Note that if the constants of integration are included in the k m (x) then, as well 
as finding the particular integral, we introduce an addition to the complementary 
function. 


► Use the variation of parameters method to solve 

d 2 y 


dx 2 


+ y = cosec x. 


subject to the boundary conditions y(0) = y(n/2) = 0. 


(15.57) 


The complementary function of (15.57) is again 

y c (x) = ci sin x + c 2 cos x. 

We therefore assume a particular integral of the form 

y p (x) = fci(x) sin x + k 2 (x) cos x, 
and impose the additional constraints of (15.55), i.e. 

k[ (x) sin x + k' 2 (x) cos x = 0, 
k[ (x) cos x — k 2 (x) sin x = cosec x. 

Solving these equations for k\ (x) and k 2 (x) gives 

k[(x) = cos x cosec x = cotx, 
k 2 (x) = — sin x cosec x = — 1. 

Hence, ignoring the constants of integration, kfx) and fcafx) are given by 

fci(x) = ln(sinx), 
k 2 {x) = —x. 

The general solution to the ODE (15.57) is therefore 

y(x) = [ci + ln(sin x)] sin x + (c 2 — x) cos x, 

which is identical to the solution found in subsection 15.2.3. Applying the boundary 
conditions y(0) = y( n/2) = 0 we find ci = c 2 = 0 and so 

y{x) = ln(sin x) sin x — x cos x. ◄ 


Solution method. If the complementary function of (15.53) is known then assume 
a particular integral of the same form but with the constants replaced by functions 
of x. Impose the constraints in (15.55) and solve the resulting system of equations 
for the unknowns k[(x),k' 2 , . ■ ■ ,k' n (x). Integrate these functions, setting constants of 
integration equal to zero, to obtain ki(x),k 2 (x), . . . ,k„(x) and hence the particular 
integral. 
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15.2.5 Green 's functions 

The Green’s function method of solving linear ODEs bears a striking resemblance 
to the method of variation of parameters discussed in the previous subsection; 
it too requires knowledge of the entire complementary function in order to find 
the particular integral and therefore the general solution. The Green’s function 
approach differs, however, since once the Green’s function for a particular LHS of 
(15.1) and accompanying boundary conditions has been found, then the solution 
for any RHS (i.e. any f{x)) can be written down immediately, albeit in the form 
of an integral. 

Although the Green’s function method can be approached by considering the 
superposition of eigenfunctions of the equation (see chapter 17) and is also 
applicable to the solution of partial differential equations (see chapter 19), this 
section adopts a more utilitarian approach based on the properties of the Dirac 
delta function (see subsection 13.1.3) and deals only with the use of Green’s 
functions in solving ODEs. 

Let us again consider the equation 

d n y dy 

a n (x)— H 1- ai(x)~ - + a 0 (x)y = f(x), (15.58) 

but for the sake of brevity we now denote the LEIS by Cy{x), i.e. as a linear 
differential operator acting on y(x). Thus (15.58) now reads 


Cy{x) = f(x). 


(15.59) 


Let us suppose that a function G(x,z ) exists (the Green’s function ) such that the 
general solution to (15.59), which obeys some set of imposed boundary conditions 
in the range a < x < b, is given by 

y[x) = f G(x,z)f(z)dz, (15.60) 


where z is the integration variable. If we apply the linear differential operator £ 
to both sides of (15.60) and use (15.59) then we obtain 

r b 


Cy(x)= [£G(x,z)]f(z)dz=f(x). (15.61) 


J a 


Comparison of (15.61) with a standard property of the Dirac delta function (see 
subsection 13.1.3), namely 

fix) = ( 5{x- z)f{z)dz, 

J a 

for a < x < b, shows that for (15.61) to hold for any arbitrary function f(x), we 
require (for a < x < b) that 


£G(x,z ) = 5(x — z), 


(15.62) 
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i.e. the Green’s function G(x,z ) must satisfy the original ODE with the RHS set 
equal to a delta function. G(x, z) may be thought of physically as the response of 
a system to a unit impulse at x = z. 

In addition to (15.62), we must impose two further sets of restrictions on 
G(x,z). The first is the requirement that the general solution y(x) in (15.60) obeys 
the boundary conditions. For homogeneous boundary conditions, in which y(x) 
and/or its derivatives are required to be zero at specified points, this is most 
simply arranged by demanding that G(x,z) itself obeys the boundary conditions 
when it is considered as a function of x alone; if, for example, we require 
y(a) = y(b) = 0 then we should also demand G(a,z) = G(b,z ) = 0. Problems 
having inhomogeneous boundary conditions are discussed at the end of this 
subsection. 

The second set of restrictions concerns the continuity or discontinuity of G(x,z) 
and its derivatives at x = z and can be found by integrating (15.62) with respect 
to x over the small interval [z — e, z + e] and taking the limit as e — * 0. We then 
obtain 

" rz+e d m G(x z) r Z+e 

lim^^ / a m (x ) — -z-yy — dx = lim / 8(x — z)dx = 1. (15.63) 

e ~ m _n J z—e dX e— >0 J z _ e 

Since d n G/dx n exists at x = z but with value infinity, the (n— l)th-order derivative 
must have a finite discontinuity there, whereas all the lower-order derivatives, 
d m G/dx m for m < n — 1, must be continuous at this point. Therefore the terms 
containing these derivatives cannot contribute to the value of the integral on 
the LHS of (15.63). Noting that, apart from an arbitrary additive constant, 
/ (d'"G/dx m ) dx = d m ~ 1 G/dx m ~ l , and integrating the terms of (15.63) by parts we 
find 

rz+e d"'Glx z) 

lim/ a m (x) { L-Lldx = 0 (15.64) 

f->o J z _ e dx m 

for m = 0 to n — 1. Thus, since only the term containing d n G/dx n contributes to 
the integral in (15.63), we conclude, after performing an integration by parts, that 

n.L) f : 6i rf =i. (15.65) 

<r-0 [ dx"- 1 \ z _ e 

Thus we have the further n constraints that G(x,z) and its derivatives up to order 
n — 2 are continuous at x = z but that d n ~ l G/dx n ~ 1 has a discontinuity of 1 /a n (z) 
at x = z. 

Thus the properties of the Green’s function G(x,z) for an nth-order linear ODE 
may be summarised by the following. 

(i) G(x, z) obeys the original ODE but with f(x) on the RHS set equal to a 
delta function S(x — z). 
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(ii) When considered as a function of x alone G(x,z) obeys the specified 
(homogeneous) boundary conditions on y(x). 

(iii) The derivatives of G(x,z) with respect to x up to order n — 2 are continuous 
at x = 2 , but the (n — l)th-order derivative has a discontinuity of l/a n (z) 
at this point. 


► Use Green’s functions to solve 

d 2 y 

—^r + y = cosec x, 
ax 1 

subject to the boundary conditions y(0) = y(n/2) = 0. 


(15.66) 


From (15.62) we see that the Green’s function G(x,z) must satisfy 

1 _|_ G(x,z) = <5(x — z). (15.67) 

Now it is clear that for x f= z the RHS of (15.67) is zero, and we are left with the 
task of finding the general solution to the homogeneous equation, i.e. the complementary 
function. The complementary function of (15.67) consists of a linear superposition of sinx 
and cos x and must consist of different superpositions on either side of x = z, since its 
( n — l)th derivative (i.e. the first derivative in this case) is required to have a discontinuity 
there. Therefore we assume the form of the Green's function to be 

( A(z) sinx + B(z) cos x for x < 2 , 

G(x,z) = ^ 

(C(z) sinx + D(z) cos x for x > z. 

Note that we have performed a similar (but not identical) operation to that used in the 
variation of parameters method, i.e. we have replaced the constants in the complementary 
function with functions (this time of z). 

We must now impose the relevant restrictions on G(x,z) in order to determine the 
functions A(z),...,D(z). The first of these is that G(x,z) should itself obey the homogeneous 
boundary conditions G(0, z) = G(n/2,z) = 0. This leads to the conclusion that B(z) = 
C(z) = 0, so we now have 


G(x,z) 


(A(z) sin x for x < z, 
\D(z)cosx for x > z. 


The second restriction is the continuity conditions given in equations (15.64), (15.65), 
namely that, for this second-order equation, G(x,z) is continuous at x = z and dG/dx has 
a discontinuity of 1 /a 2 (z) = 1 at this point. Applying these two constraints we have 


D(z)cosz — A(z)sinz = 0 
— D(z) sin z — A(z) cos z = 1. 


Solving these equations for A(z) and D(z), we find 

A(z) = — cosz, D(z) = — sinz. 


Thus we have 


G(x, z) 


( — cos z sin x for x < z, 
\ — sin z cos x for x > z. 


Therefore, from (15.60), the general solution to (15.66) that obeys the boundary conditions 
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y(0) = y(n/2) = 0 is given by 


rn/2 


y(x) = / G(x,z) cosec z dz 


r n/2 


= — cos x / sin z cosec z dz — sin x / cos z cosec z dz 
Jo Jx 

= —x cos x + sin x ln(sin x), 

which agrees with the result obtained in the previous subsections. ◄ 


As mentioned earlier, once a Greens function has been obtained for a given 
LHS and boundary conditions, it can be used to find a general solution for any 
RHS; thus, the solution of d 2 y/dx 2 + y = f(x), with y(0) = y(n/2) = 0, is given 
immediately by 


r't/2 

y(x) = / G(x,z)f(z)dz 
Jo 


r-x rn/2 

— — cosx / sinz f(z)dz — sinx / cos z f(z)dz. (15.68) 

Jo Jx 

As an example, the reader may wish to verify that if f(x) — sin 2x then (15.68) 
gives y(x) = (— sin2x)/3, a solution easily verified by direct substitution. In 
general, analytic integration of (15.68) for arbitrary f(x) will, prove intractable; 
then the integrals must be evaluated numerically. 

Another important point is that although the Greens function method above 
has provided a general solution, it is also useful for finding a particular integral 
if the complementary function is known. This is easily seen since in (15.68) the 
constant integration limits 0 and n/2 lead merely to constant values by which 
the factors sinx and cosx are multiplied; thus the complementary function is 
reconstructed. The rest of the general solution, i.e. the particular integral, comes 
from the variable integration limit x. Therefore by changing f X/ ' 2 to — f x , and so 
dropping the constant integration limits, we can find just the particular integral. 
For example, a particular integral of d 2 y/dx 2 + y = f(x) that satisfies the above 
boundary conditions is given by 

, p „) - - cos x f sin z /, z) Jz + sin X /' cos z /( z) *. 


A very important point to realise about the Greens function method is that a 
particular G(x,z) applies to a given LHS of an ODE and the imposed boundary 
conditions, i.e. the same equation with different boundary conditions will have a 
different Green’s function. To illustrate this point, let us consider again the same 
ODE as solved above, but with different boundary conditions. 
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► Use Green’s functions to solve 


d 2 y 

dx 2 


+ y = fix). 


(15.69) 


subject to the one-point boundary conditions y(0) = y'(0) = 0. 

We again require (15.67) to hold and so again we assume a Green’s function of the form 

( A(z) sinx + Biz) cos x for x < z, 

G(x,z) = ^ 

[ C(z ) sin x + L>(z) cos x for x > z. 

However, we now require G(x,z) to obey the boundary conditions G(0,z) = G'(0,z) = 0, 
which imply A(z) = B(z) = 0. Therefore we have 

f 0 for x < z, 

G(x,z) = < 

[C(z) sinx + Lfiz) cos x for x > z. 

Applying the continuity conditions on G(x,z) as before now gives 

C(z)sinz + D(z)cosz = 0, 

C(z) cos z — D(z) sinz = 1, 

which are solved to give 

C(z) = cosz, D(z) = — sinz. 

So finally the Green’s function is given by 

f 0 for x < z, 

G(x,z) = < . 

^sin(x — z) tor x > z, 

and the general solution to (15.69) that obeys the boundary conditions y(0) = j/(0) = 0 is 


y(x) = / G(x,z)f(z)dz 
JO 

= f sin(x — z)f(z)dz. ◄ 


Finally, we consider how to deal with inhomogeneous boundary conditions 
such as y(a) = a, y(b) = f or y(0) = y'(0) = y etc., where a, fi, y are non- 
zero. The simplest method of solution in this case is to make a change of 
variable such that the boundary conditions in the new variable, it say, are 
homogeneous, i.e. u(a) = u(b) = 0 or m(0) = m'( 0) = 0 etc. For nth-order equations 
we generally require n boundary conditions to fix the solution, but these n 
boundary conditions can be of various types. For example we may have the n- 
point boundary conditions y(x m ) = y m for m = 1 to n, or the one-point boundary 
conditions y(xo) = y'(x o) = • • • = y < "^ 1) (xo) = To, or something in between. In all 
cases a suitable change of variable is 

u = y — h(x), 

where /fix) is an (n — l)th-order polynomial that obeys the boundary conditions. 
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For example, if we consider the second-order case with boundary conditions 
y(a) = a, y(b) — /? then a suitable change of variable is 

u — y — (mx + c), 

where y = mx + c is the straight line through the points (a, a) and ( b,ft ); this 
is given by m = (a — j))/ (a — b) and c = (/in — a b)/(a — b). Alternatively, if the 
boundary conditions for our second-order equation are y(0) = y'(0) = y then we 
would make the same change of variable, but this time y = mx + c would be the 
straight line through (0, y) with slope y, i.e. m — c = y. 

Solution method. Require that the Green’s function G(x,z ) obeys the original ODE, 
but with the RHS set to a delta function d(x — z). This is equivalent to assuming 
that G(x, z) is given by the complementary function of the original ODE, with the 
constants replaced by functions of z; these functions are different for x < z and x > 
z. Now require cdso that G(x,z) obeys the given homogeneous boundary conditions 
and impose the continuity conditions given in (15.64) and (15.65). The general 
solution to the original ODE is then given by (15.60). For inhomogeneous boundary 
conditions, make the change of dependent variable u = y — h(x), where h(x) is a 
polynomial obeying the given boundary conditions. 


15.2.6 Canonical form for second-order equations 


In this section we specialise from nth-order linear ODEs with variable coefficients 
to those of order 2. In particular we consider the equation 


d 2 v , ,dy . . ,, , 

^2 + * dx + a °( x ’ y = -1 1 M’ 


(15.70) 


which has been rearranged so that the coefficient of d 2 y/dx 2 is unity. By making 
the substitution y(x) = u(x)v(x) we obtain 


/ 2.U i . 

v + ( b a i ) v + 

u 


a\u + aou 


u 


(15.71) 


where the prime denotes differentiation with respect to x. Since (15.71) would be 
much simplified if there were no term in v', let us choose u(x) such that the first 
factor in parentheses on the LHS of (15.71) is zero, i.e. 


2m' 

— -)- a\ — 0 
u 


M(x) = exp| — 1 J a] (z) dz | . (15.72) 


We then obtain an equation of the form 

d 2 v 


— + g( x ) v = h(x). 


(15.73) 
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where 

g(x) = ao(x) - j[«i(x)] 2 - \dfx) 
h(x) = /(x)exp|j J «i(z)dz|. 


Since (15.73) is of a 

simpler form than the original equation, (15.70), it may 

prove easier to solve. 



►So/re 

4x2 U +4x Tx +ix2 - l)y = 0 - 

(15.74) 


Dividing (15.74) through by Ax 2 , we see that it is of the form (15.70) with ai(x) = 1 /x, 
a 0 (x) = (x 2 — l)/4x 2 and f(x) = 0. Therefore, making the substitution 


we obtain 


y = vu = v exp 




<fv v 
dd- + 4 = °‘ 


Equation (15.75) is easily solved to give 

v = ci sin jx + C 2 cos ^ x , 

so the solution of (15.74) is 


y = 


v 



c i sin \x + C 2 cos \x 


(15.75) 


As an alternative to choosing u(x) such that the coefficient of v' in (15.71) is 
zero, we could choose a different u(x) such that the coefficient of v vanishes. For 
this to be the case, we see from (15.71) that we would require 

u" + «!«' + aou = 0, 

so u(x ) would have to be a solution of the original ODE with the REIS set to 
zero, i.e. part of the complementary function. If such a solution were known then 
the substitution y = uv would yield an equation with no term in v, which could 
be solved by two straightforward integrations. This is a special (second-order) 
case of the method discussed in subsection 15.2.3. 


Solution method. Write the equation in the form (15.70), then substitute y = uv, 
where u(x) is given by (15.72). This leads to an equation of the form (15.73), in 
which there is no term in dv/dx and which may be easier to solve. Alternatively, 
if part of the complementary function is known then follow the method of subsec- 
tion 15.2.3. 
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15.3 General ordinary differential equations 

In this section, we discuss miscellaneous methods for simplifying general ODEs. 
These methods are applicable to both linear and non-linear equations and in 
some cases may lead to a solution. More often than not, however, Ending a 
closed-form solution to a general non-linear ODE proves impossible. 


15.3.1 Dependent variable absent 

If an ODE does not contain the dependent variable y explicitly, but only its 
derivatives, then the change of variable p = dy/dx leads to an equation of one 
order lower. 


►So/ye 


d 2 y -| dy 
dx 2 dx 


= 4x 


(15.76) 


This is transformed by the substitution p = dy /dx to the first-order equation 

$^+2p = 4x. (15.77) 

dx 

The solution to (15.77) is then found by the method of subsection 14.2.4 and reads 

p = ~r = ae~ 2x + 2x — 1, 
dx 

where a is a constant. Thus by direct integration the solution to the original equation, 
(15.76), is 

y(x) = c\e~ lx + x 2 — x + ci. ◄ 

An extension to the above method is appropriate if an ODE contains only 
derivatives of y that are of order m and greater. Then the substitution p = d m y/dx m 
reduces the order of the ODE by m. 

Solution method. If the ODE contains only deriva tives of y that are of order m and 
greater then the substitution p = d m y/dx m reduces the order of the equation by m. 


15.3.2 Independent variable absent 

If an ODE does not contain the independent variable x explicitly, except in d/dx, 
d 2 /dx 2 etc., then as in the previous subsection we make the substitution p = dy/dx 
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but also write 


d 2 y 

dx 2 


d 3 y 
dx 3 


dp dy dp dp 

dx dx dy ^ dy 

d f dp\ dy d / dp 

dx \ dy ) dx dy \ dy 




(15.78) 


and so on for higher-order derivatives. This leads to an equation of one order 
lower. 


►So/re 





(15.79) 


Making the substitutions dy/dx = p and d 2 y/dx 2 = p(dp/dy) we obtain the first-order 
ODE 


l+ y p ty +p2= °* 

which is separable and may be solved as in subsection 14.2.1 to obtain 

(i +p 2 )y 2 = ci. 


Using p = dy/ dx we therefore have 



which may be integrated to give the general solution of (15.79); after squaring this reads 

(x + c 2 ) 2 + y 2 = c\. ◄ 


Solution method. If the ODE does not contain x explicitly then substitute p = 
dy/dx, along with the relations for higher derivatives given in ( 15.78), to obtain an 
equation of one order lower, which may prove easier to solve. 


15.3.3 Non-linear exact equations 

As discussed in subsection 15.2.2, an exact ODE is one that can be obtained by 
straightforward differentiation of an equation of one order lower. Moreover, the 
notion of exact equations is useful for both linear and non-linear equations, since 
an exact equation can be immediately integrated. It is possible, of course, that 
the resulting equation may itself be exact, so that the process can be repeated. 
In the non-linear case, however, there is no simple relation (such as (15.43) for 
the linear case) by which an equation can be shown to be exact. Nevertheless, a 
general procedure does exist and is illustrated in the following example. 
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►So/re 

dx* dx dx z 

(15.80) 


Directing our attention to the term on the LHS of (15.80) that contains the highest-order 
derivative, i.e. 2 y d 3 y/dx 3 , we see that it can be obtained by differentiating 2 y d 2 y/dx 2 since 


d 

dx 



2 d 3 y dy d 2 y 

dx 3 dx dx 2 


(15.81) 


Rewriting the LHS of (15.80) using (15.81), we are left with 4 (dy / dx)(d 2 y / dy 2 ), which may 
itself be written as a derivative, i.e. 


dy d 2 y d ^ / dy \ 2 
dx dx 2 dx \ dx J 


(15.82) 


Since, therefore, we can write the LHS of (15.80) as a sum of simple derivatives of other 
functions, (15.80) is exact. Integrating (15.80) with respect to x, and using (15.81) and 
(15.82), now gives 


2 y 


d 2 y 

dx 2 


+ 2 




(15.83) 


Now we can repeat the process to find whether (15.83) is itself exact. Considering the term 
on the LHS of (15.83) that contains the highest-order derivative, i.e. lyd 2 y/dx 2 , we note 
that we obtain this by differentiating 2y dy /dx, as follows: 


d 

dx 




+ 2 


f dy 

\dx 


2 


The above expression already contains all the terms on the LHS of (15.83), so we can 
integrate (15.83) to give 

, dy x 3 

2 y-r- = -r +C 1 X + C 2 . 
dx 6 

Integrating once more we obtain the solution 

4 2 

2 X C\X 

y~ = ^ + — + C 2 X + C 3 . ◄ 


It is worth noting that both linear equations (as discussed in subsection 15.2.2) 
and non-linear equations may sometimes be made exact by multiplying through 
by an appropriate integrating factor. Although no general method exists for 
finding such a factor, one may sometimes be found by inspection or inspired 
guesswork. 

Solution method. Rearrange the equation so that all the terms containing y or its 
derivatives are on the LHS, then check to see whether the equation is exact by 
attempting to write the LHS as a simple derivative. If this is possible then the 
equation is exact and may be integrated directly to give an equation of one order 
lower. If the new equation is itself exact the process can be repeated. 
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15.3 GENERAL ORDINARY DIFFERENTIAL EQUATIONS 


15.3.4 lsobaric or homogeneous equations 

It is straightforward to generalise the discussion of first-order isobaric equations 
given in subsection 14.2.6 to equations of general order n. An nth-order isobaric 
equation is one in which every term can be made dimensionally consistent upon 
giving y and dy each a weight m, and x and dx each a weight 1. Then the nth 
derivative of y with respect to x, for example, would have dimensions m in y and 
— n in x. In the special case, with m = 1, for which the equation is dimensionally 
consistent the equation is called homogeneous (not to be confused with linear 
equations with a zero RHS). If an equation is isobaric or homogeneous then the 
change in dependent variable y = vx m (y — vx in the homogeneous case) followed 
by the change in independent variable x = e f leads to an equation in which the 
new independent variable f is absent except in the form d/dt. 


►So/re , 

x3 ^- (xl+xy) % + ^ 2+xy)=o - 


(15.84) 


Assigning y and dy the weight m, and x and dx the weight 1, the weights of the five terms 
on the LHS of (15.84) are, from left to right: m + 1, m + 1, 2m, 2m, m + 1. For these 
weights all to be equal we require m = 1; thus (15.84) is a homogeneous equation. Since it 
is homogeneous we now make the substitution y = vx, which, after dividing the resulting 
equation through by x 3 , gives 


d 2 v 

dx 2 


+a-v)^ = o. 

ax 


(15.85) 


Now substituting x = e‘ into (15.85) we obtain (after some working) 

d 2 v dv 

~rx — v-r- = 0 , 

dt 2 dt 

which can be integrated directly to give 

dv , , 

Tt = 2V +C1 ' 

Equation (15.87) is separable, and integrates to give 

dv 


5 f + di — J ■ 


+ d 2 


= — tan ( — 
d\ 


(15.86) 

(15.87) 


Rearranging and using x = e f and y = vx we finally obtain the solution to (15.84) as 

y = dix tan (^4 1 In x + 4^) ■ ^ 


Solution method. Assume that y and dy have weight m, and x and dx weight 1, 
and write down the combined weights of each term in the ODE. If these weights can 
be made equal by assuming a particular value for m then the equation is isobaric 
(or homogeneous if m = 1). Making the substitution y = vx m followed by x = e f 
leads to an equation in which the new independent variable t is absent except in the 
form d/dt. 
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15.3.5 Equations homogeneous in x or y alone 

It will be seen that the intermediate equation (15.85) in the example of the 
previous subsection was simplified by the substitution x = e\ in that this led to 
an equation in which the new independent variable t occurred only in the form 
d/dt, see (15.86). A closer examination of (15.85) reveals that it is dimensionally 
consistent in the independent variable x taken alone ; this is equivalent to giving 
the dependent variable and its differential a weight m = 0. For any equation that 
is homogeneous in x alone, the substitution x = e x will lead to an equation that 
does not contain the new independent variable f except as d/dt. Note that the 
Euler equation of subsection 15.2.1 is a special, linear example of an equation 
homogeneous in x alone. Similarly, if an equation is homogeneous in y alone, then 
substituting y = e' : leads to an equation in which the new dependent variable, v, 
occurs only in the form d/dv. 



This equation is homogeneous in x alone, and on substituting x = e‘ we obtain 


d 2 y 2 
dt 2 y 3 


= 0 , 


which does not contain the new independent variable f except as d/dt. Such equations 
may often be solved by the method of subsection 15.3.2, but in this case we can integrate 
directly to obtain 

This equation is separable, and we find 


/ 


dy 

VWt + i/y 2 ) 


t + C2- 


By multiplying the numerator and denominator of the integrand on the LHS by y, we find 
the solution 

I 

f + C 2 - 

Remembering that t = In x we finally obtain 



y/ciy 2 + 1 
V2ci 


= lnx + C2. 


◄ 


Solution method. If the weight of x taken alone is the same in every term in the 
ODE then the substitution x = e x leads to an equation in which the new independent 
variable t is absent except in the form d/dt. If the weight of y taken alone is the 
same in every term then the substitution y = e v leads to an equation in which the 
new dependent variable v is absent except in the form d/dv. 
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15.4 EXERCISES 


15.3.6 Equations having y = A e x as a solution 


Finally, we note that if any general (linear or non-linear) nth-order ODE is 
satisfied identically by assuming that 


dy d n y 

^ dx dx n 


(15.88) 


then y = Ae x is a solution of that equation. This must be so because y = Ae x is 
a non-zero function that satisfies (15.88). 


►Find a solution of 


(x 2 + x) 


dy d 2 y 
dx dx 2 



— x 



= 0 . 


(15.89) 


Setting y = dy/dx = dry/dx 2 in (15.89), we obtain 

(x 2 + x)y 2 — x 2 y 2 — xy 2 = 0, 

which is satisfied identically. Therefore y = Ae x is a solution of (15.89); this is easily 
verified by directly substituting y = Ae x into (15.89). ◄ 


Solution method. If the equation is satisfied identically by assuming that y = 
dy / dx — ■ ■ ■ = d n y/dx n then y = Ae x is a solution. 


15.4 Exercises 

15.1 A simple harmonic oscillator, with natural frequency w o, experiences an oscillating 
driving force /(f) = cos cot Therefore, its equation of motion is 

x 

— -r- + COnX = cos cof, 
dt 2 u 

where x is its position. Given that at t = 0 we have x = dx/dt = 0, find the 
function x(t). Describe the solution if co is approximately, but not exactly, equal 
to OJo- 

15.2 Find the roots of the auxiliary equation for the following. Elence solve them for 
the boundary conditions stated. 

(a) ^ + 2 ^ + 5/ = 0 with /(0) = 1, /'(0) = 0. 

(b) d ^ + 2 d -j r +5f = e-'co S 3t with /(0) = 0,/'(0) = 0. 

15.3 The theory of bent beams shows that at any point in the beam the ‘bending 
moment’ is given by K / p, where K is a constant (that depends upon the beam 
material and cross-sectional shape) and p is the radius of curvature at that point. 
Consider a light beam of length L whose ends, x = 0 and x = L, are supported 
at the same vertical height and which has a weight W suspended from its centre. 
Verify that at any point x (0 < x < L/2 for definiteness) the net magnitude of 
the bending moments, (bending moment = force x perpendicular distance) due 
to the weight and support reactions, evaluated on either side of x, is Wx/2. 
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15.4 


15.5 


15.6 


15.7 


15.8 


If the beam is only slightly bent, so that ( dy/dx ) 2 C 1 , where y = y(x) is the 
downward displacement of the beam at x, show that the beam profile satisfies 
the approximate equation 

d 2 y _ Wx 
dx 2 ~ ~ 2 IK 


By integrating this equation twice and using physically imposed conditions on 
your solution at x = 0 and x = L/2, show that the downward displacement at 
the centre of the beam is WL 2 /(4SK). 

Solve the differential equation 


d 2 f Jf 
dt 2 dt 


+ 9 f = e-‘, 


subject to the conditions / = 0 and df /dt = X at f = 0. 

Find the equation satisfied by the positions of the turning points of f(t) and 
hence, by drawing suitable sketch graphs, determine the number of turning points 
the solution has in the range t > 0 if (a) X = 1/4, and (b) X = —1/4. 

The function /(f) satisfies the differential equation 

S +8 £ + 12/,12«-* 

dt 2 dt 

For the following sets of boundary conditions determine whether it has solutions, 
and, if so, find them: 

(a) /(0) = 0, /'( 0) = 0, /(ln^/2) = 0; 

(b) /(0) = 0, /'( 0) 2, / (In y/2) = 0. 

Determine the values of a and / for which the following functions are linearly 
dependent : 


yi(x) = xcoshx + sinhx, 
yi(x) = x sinh x + cosh x, 
V 3 ( x ) = (x + CL)e X , 
y 4 (x) = (x + P)e~ x . 


You will find it convenient to work with those linear combinations of the y-,(x) 
that can be written the most compactly. 

A solution of the differential equation 


dry t dy 

dx 2 dx 


+ y = 4e x 


takes the value 1 when x = 0 and the value e^ 1 when x = 1. What is its value 
when x = 2? 

The two functions x(f) and y(t) satisfy the simultaneous equations 

dx 

— 2v = — smf, 

dt 

dv „ 

-h + 2x = 5 cos t. 
dt 

Find explicit expressions for x(f) and y(t), given that x(0) = 3 and y(0) = 2. 
Sketch the solution trajectory in the xy-plane for 0 < t < 2n, showing that 
the trajectory crosses itself at (0, 1/2) and passes through the points (0, — 3) and 
(0,-1) in the negative x-direction. 
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15.9 


15.10 


15.11 


15.12 


15.13 


15.14 


15.15 


Find the general solutions of 

(a) 12^ + 16,- 32a-: 

ax* ax 


(b) 


d 

dx 


1 dy 
y dx 


+ (2 a coth 2 ax) 

dx 


= 2 a 2 


where a is a constant. 

Use the method of Laplace transforms to solve 


<*> £ +5 37 + ^ = 0 - /(«) - l. AO) - -4. 

d 2 fdf , 

(b) ^-+2-^+5/ = 0, /(0) = 1, /'(0) = 0. 

The quantities x(f), y(f) satisfy the simultaneous equations 

x + 2nx + n 2 x = 0, 
y + 2ny + « 2 y = fix, 


where x(0) = y(0) = v(0) = 0 and x(0) = X. Show that 
y(t) = \fiXt 2 (1 — \nt) exp(— nt). 

Use Laplace transforms to solve, for t > 0, the differential equations 


x + 2x + y = cos f, 
y + 2x + 3y = 2 cos f. 


which describe a coupled system that starts from rest at the equilibrium position. 
Show that the subsequent motion takes place along a straight line in the xy-plane. 
Verify that the frequency at which the system is driven is equal to one of the 
resonance frequencies of the system; explain why there is no resonant behaviour 
in the solution you have obtained. 

Two unstable isotopes A and B and a stable isotope C have the following decay 
rates per atom present: A — > B, 3 s _1 ; A — > C, 1 s _1 ; B — ► C, 2 s~T Initially 
a quantity x 0 of A is present and none of the other two types. Using Laplace 
transforms, find the amount of C present at a later time t. 

For a lightly damped (y < w 0 ) harmonic oscillator driven at its undamped 
resonance frequency co o, the displacement x(t) at time f satisfies the equation 

d 2 x dx 2 

— -r- + 2y — + WnX = F sin coot. 
dt 2 dt 

Use Laplace transforms to find the displacement at a general time if the oscillator 
starts from rest at its equilibrium position. 

(a) Show that ultimately the oscillation has amplitude F/{2.a>oy) with a phase 
lag of n/2 relative to the driving force F. 

(b) By differentiating the original equation, conclude that if x(f) is expanded as 
a power series in t for small t then the first non-vanishing term is Fa>ot 3 /6. 
Confirm this conclusion by expanding your explicit solution. 

The ‘golden mean’, which is said to describe the most aesthetically pleasing 
proportions for the sides of a rectangle (e.g. the ideal picture frame), is given 
by the limiting value of the ratio of successive terms of the Fibonacci series u n , 
which is generated by 

Un + 1 + n fi, 

with mo = 0 and tq = 1. Find an expression for the general term of the series and 
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15.16 


15.17 


15.18 


15.19 


15.20 


15.21 


15.22 


verify that the golden mean is equal to the larger root of the recurrence relation’s 
characteristic equation. 

In a particular scheme for modelling numerically one-dimensional fluid flow, the 
successive values, u„, of the solution are connected for n > 1 by the difference 
equation 

c(ttn+l U 7J i ) (l{ il n | ] 2u n + Ujj | ), 

where c and d are positive constants. The boundary conditions are u 0 = 0 and 
um = 1. Find the solution to the equation and show that successive values of u„ 
will have alternating signs if c > d. 

The first few terms of a series u,„ starting with uo, are 1,2,2, 1,6, —3. The series 
is generated by a recurrence relation of the form 

u„ = Pu n _ 2 + Qu „- 4 , 

where P and Q are constants. Find an expression for the general term of the 
series and show that the series in fact consists of two other interleaved series 
given by 

«2m = f + l^ 1 ", 

U2m+1 = l ~ 1 4 ", 

for m = 0, 1 , 2 , . . . . 

Find an explicit expression for the u n satisfying 

Un + 1 T 5«„ + (yu n i = 2", 

given that uq = u\ = 1. Deduce that 2" — 26( — 3 )” is divisible by 5 for all integer 
n. 

Find the general expression for the u„ satisfying 


ttn+l — 2.U/1 — 2 U„-l 


with u 0 = ui = 0 and u 2 = 1, and show that they can be written in the form 


u 


n 


1 2" /2 
5-^ C0S 



where tan <j> = 2. 

Consider the seventh-order recurrence relation 


tt/i+7 Un+6 Mn+5 T Un+ 4 tt n +3 “t“ Un+2 T Lt>i + 1 U n — 0. 


Find the most general form of its solution, and show that : 


(a) if only the four initial values ug = 0, u\ = 2, u 2 = 6 and u 2 = 12, are specified, 
the relation has one solution which cycles repeatedly through this set of four 
numbers. 

(b) but if, in addition, it is required that u 4 = 20, u 5 = 30 and u 6 = 42 then the 
solution is unique, with u„ = n(n + 1). 


Find the general solution of 


, d 2 y dy 
V ,/v' ' - V 


dx 


given that y{ 1 ) = 1 and y(e ) = 2e. 
Find the general solution of 


(.+ l)=*+3(* + 1)4 +,-*■. 

ax 1 dx 
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15.23 Prove that the general solution of 


15.24 


15.25 


15.26 


15.27 


15.28 


(x-2)g+3| + ^-0 


dx 2 dx 


is given by 


y(.x) = 


1 




+ cx 


(x-2 f 

Use the method of variation of parameters to find the general solutions of 


<•> s 


■>S, ,b)^- 4- +,-2*. 

dx 1 dx 


Use the intermediate result of exercise 15.24(a) to find the Green’s function which 
satisfies 


with 


G(0, £) = G(l, £) = 0. 


d 2 G(x,£) 

— G(x, £) = 5(x - <*) 

(a) Given that .vi(x) = 1/x is a solution of 

F(x,y) = x(x + 1)^ + (2 — x 2 )^- — (2 + x)y = 0, 

find a second linearly independent solution, 

(i) by setting y 2 (x) = yi(x)u(x), 

(ii) by noting the sum of the coefficients in the equation. 

(b) Elence, using the variation of parameters method, find the general solution 
of 

F(x,y) = (x+ l) 2 . 

Show generally that if yi(x) and y 2 (x) are linearly independent solutions of 

- + p(x)j^ + q(x)y = 0, 
dx z dx 

with yi(0) = 0 and V 2 (l) = 0, then the Green's function G(x,t;) for the interval 
0 < x, f < 1 and with G(0, i) = G(l, £) = 0 can be written in the form 


G(x,C) = 


yi(x)yM)/W(i;) 0 < x < <^ 
n(x)yi(Z)lw(Z) i<x<i 


where W(x) = W [yi(x), yi(x)\ is the Wronskian of vi(x) and yiix). 

Use the result of the previous exercise to find the Green’s function G(x, £) that 
satisfies 

d 2 G .~dG x 

-r ~2 + 3- — \-2G = d(x — x ), 

llA WA 

in the interval 0 < x, £ < 1 with G( 0, £) = G(l,£) = 0. Elence obtain integral 
expressions for the solution of 


d 2 y.^dy 

dS- + ^ +2y ~ 


0 0 < x < xo, 

1 Xo < x < 1, 


distinguishing between the cases (a) x < xo, and (b) x > x 0 . 

15.29 The equation of motion for a driven damped harmonic oscillator can be written 

x + 2x + (1 + k 2 )x = / (t), 

with k 0. If it starts from rest with x(0) = 0 and x(0) = 0, find the corresponding 
Green’s function G(f, t) and verify that it can be written as a function of f — z 
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15.31 


15.32 


15.33 


15.34 


15.35 


15.36 
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only. Find the explicit solution when the driving force is the unit step function, 
i.e. /(f) = H(t). Confirm your solution by taking the Laplace transforms of both 
it and the original equation. 

Show that the Green’s function for the equation 

d 2 y , 

c& + 4 =/W ’ 

subject to the boundary conditions y(0) = y(n) = 0, is given by 

—2 cos ix sin f r 0 < z < x. 


G(x,z) = 


—2 sin f x cos \z x < z < n. 


Find the Green’s function x = G(t,to) that solves 

d 2 x dx 

_ +a _ =(5(f ^ o) 

under the initial conditions x = dx/dt = 0 at t = 0. Hence solve 

d 2 x dx . 

where /(f) = 0 for f < 0. 

Evaluate your answer explicitly for /(f) = Ae~ at (f > 0). 

(a) By multiplying through by dy/dx, write down the solution to the equation 


d 2 y 

dx 2 


+ f(y) = 0 , 


where f(y) can be any function. 

(b) A mass m, initially at rest at the point x = 0, is accelerated by a force 


fix) = A(xo - x) 


1 + 21n 1 - 


Xo 


Its equation of motion is md 2 x/dt 2 = f(x). Find x as a function of time and 
show that ultimately the particle has travelled a distance xo- 
Solve 


d 3 y 


dx 3 


dy\ d 2 y ( dyY 


~y i. \ • - V • 3 .' ,.3 • 2 I ■ . = sinx. 


dx J dx 2 \dx ) 


Find the general solution of the equation 


fy fy =Ax 

dx 3 dx 2 


Express the equation 


+ dxj- + (4x 2 + 6 )y = e * 2 sin2x 

in canonical form and hence find its general solution. 

Find the form of the solutions of the equation 

fyd*i_ ( d 2 y\ 2 ( dy\ 2 


d\ dx 3 


dx 2 J 


dx 


0 


which have y(0) = oo. 

(You will need the result f z cosech udu = — ln(cosech z -fcothz).) 
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15.37 Consider the equation 


x p y" + 


n + 3 — 2p 
n — 1 


-v + 


n — 1 


y = y 


in which p /= 2 and n > — 1 but n=/=\. For the boundary conditions y(l) = 0 and 
y'( 1) = 2, show that the solution is y(x) = t>(x):x (p ~ 2)/( "” 1) , where v(x) is given by 


/■»(*) 


dz 


Jo [2 2 + 2z n+1 /(n + 1)] 


1/2 


= In .x. 


15.5 Hints and answers 

15.1 The function is (cog — co 2 R 1 (cos cof — cos coot); for moderate f, x(t) is a sine wave 
of linearly increasing amplitude (f sinco 0 f)/(2co 0 ); for large f it shows beats of 
maximum amplitude 2 (coq — co 2 ) -1 . 

15.2 m = — 1 + 2 i; (a) /(f) = e~ f (cos2r + \ sin2f); (b) /(f) = ie~ f (cos2r — cos3f). 

15.3 y = 0 at x = 0. From symmetry, dy/dx = 0 at x = L/2. 

15.4 /(f) = \{e~‘ + [(42 — 2)f— l]e~ 3f }. For turning points, (42+ 1) + (6 — 122)f = e 2 ‘. 

(a) 1, (b) 2. 

15.5 General solution /(f) = Ae 61 + Be 21 — 3e 4 ‘. (a) No solution, inconsistent 
boundary conditions; (b) /(f) = 2e~ 6f + e~ 2t — 3e~ 4 '. 

15.6 Set y 5 (x) = yi(x) + y 2 (x) and y 6 (x) = y 1 (x)-y 2 (x). Wronskian W(y iy y^y 5 ,y 6 ) = 
— 16(ot — 1 )(/? + 1). Thus linear dependence if a = 1, or /? = —1, or both. 

15.7 The auxiliary equation has repeated roots and the RHS is contained in the 
complementary function. The solution is y(.x) = (A+Bx)e~ x + 2x 2 e~ x . y( 2) = 5e~ 2 . 

15.8 x = 2 sin 2f + 3 cos f, y = 2 cos It — sin f. The curve is symmetric about the y-axis 
and crosses each axis four times. Its outer perimeter is heart-shaped. 

15.9 (a) The auxiliary equation has roots 2, 2, —4; (++Bx)exp2.x+C exp(— 4x)+2.x+l; 

(b) multiply through by sinh lax and note that 

f cosech2 axdx = (2 a) -1 ln(| tanhax|); y = B(sinh2u.x) 1/2 (| tanhu.x|)' 4 . 

15.10 (a) /(f) = 2e~ 3 ' — e~ 2f , (b) /(f) = e~*(cos2f + jsin2t); compare with exercise 
15.2(a). 

15.11 Use Faplace transforms; write s(s + nR 4 as (s + u) -3 — n(s + n) -4 . 

15.12 y = lx = |(cosf — cos2f), i.e. y is always a fixed multiple of .x. There is no 
resonance because the driving forces form the components, cos f, 2 cos f, of a 
vector that is a pure eigenvector corresponding to resonant frequency a> = 2, and 
contains no component of the eigenvector (1,-1) corresponding to co = 1, the 
frequency of the forces. 

15.13 _S? [C (f )] = .xo(s + 8)/[s(s + 2)(s + 4)], yielding 
C(f) = .x 0 [1 + | exp(— 4f) - | exp(— 2f)]. 

15.14 Write the numerator of the partial fraction with denominator (s + y) 2 +k 2 , where 
k 2 = co q — y 2 , in the form +(s + y) + B. 

General solution is x(f) = (F /2coo){y~ l [e ~ yl coskt — cos(coof)] +k~ l e~^ yt sinkf}. 

(b) Since x = dx/dt = sin coot = 0 at t = 0, d 2 x/dt 2 = 0 also. Differentiating and 
then setting t = 0 shows that d?x/dt 3 has the initial value coqF. 

15.15 u n = [(1 + J5Y - (1 - V 5 )"]/(2'V 5 )- 

15.16 u„ = (1 — r")/( 1 — r M ) where r = (d + c)/(d — c). If c > d, then r < —1. 

15.17 P = 5, Q = —4. u„ = 3/2 - 5(-l)"/6 + (-2)"/4 + 2"/12. 

15.18 u„ = [35(— 2)" — 26( — 3)" + 2"]/10. Note that, with this recurrence relation and 
these intial values, all u n must be integers. 

15.19 The general solution is ++B2" /2 exp(i37r;7/4)+C2"' ,2 exp(i57t;!/4). The initial values 

imply that A = 1/5, B = (^5/10)exp[i(7t — cj>)] and C = + </>)]. 
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15.20 The general solution is u„ = (A + Bn + Cn 2 ) 1" + (D + En)(— l)' 1 + Fi " + G(—i) n . 

(a) B = C = E = 0, A = 5, D = -2, F = -\ + fi, G = -§ - fi; 

(b) B = C = 1 and all other coefficients = 6. 

15.21 This is Euler's equation; setting x = expf produces d 2 z/dt 2 — 2dz/dt + z = expf 
with complementary function (A + Bt) expf and particular integral t 2 (expf)/2; 
y(x) = x + [x In x( 1 + In x)] /2. 

15.22 This is Legendre's linear equation with cc = (I = 1. Its reduced form is y"+2y'+y = 
(e‘ — l) 2 , where y = y(t ) and x + 1 = e‘. 

A particular integral is y(t) = e 2t / 9 — e f /2 + 1 and the general solution is 
y(x) = (x + 1 r 1 [A + B ln(x + 1 )] + x 2 /9 - 5x/18 + 11/18. 

15.23 After multiplication through by x 2 the coefficients are such that this is an 

exact equation. The resulting first-order equation, in standard form, needs an 

integrating factor (x — 2) 2 /x 2 . 

15.24 (a) The complementary function is Ae x + Be~ x ; writing the particular integral in 
the form k l (x)e x + k 2 (x)e~ x gives k[ = x n e~ x /2 and k' 2 = — x n e x /2. These lead to 
the particular integral — (n!/2)^^ =0 [l + (— l)" +m ]x m /m!. 

(b) Setting the particular integral equal to k\ (x)e x + k 2 (x)xe x gives the general 
solution y = (A + Bx + x 3 /3)e x . 

15.25 Given the boundary conditions, it is better to work with sinhx and sinh(l — x) 

than with e ±x ; G(x ,£) = — [sinh(l — /) sinh x]/ sinh 1 for x< / and — [sinh(l — 

x) sinh i *]/ sinh 1 for x > /. 

15.26 (a) (i) (1 + x)u" = (2 + x)u', (ii) follow subsection 15.3.6. Both give y 2 (x) = e x . 

(b) y(x) = A/x + Be x — x/2 — 1. 

15.27 Follow the method of subsection 15.2.5 but using general rather than specific 
functions. 

15.28 The relevant independent solutions are yi(x) = A(e~ x —e~ 2x ) and V 2 (x) = B(e~ x — 
e ~ 2x+] ) with Wronskian AB(e— l)e -3x . If GRx, /) = (e— iy l (e~ x — e~ 2x )(e 2i — e i+1 ) 
and G 2 (x,^) = (e — l)^(e~ x — e~ 2x+1 )(e 2i — e ( ) then (a) for x < xo,y(x) = 

Gi(x,i)dc, and (b) for x > x 0 ,y(x) = f* o G 2 (x,£)dZ + f* G l (x,Z)d(. 

15.29 G(t, t) = 0 for f < t, and sin[/v(t — t)] for t > t. For a unit step input, 

x(t) = (1 + k' 2 ) 2 ( 1 — e~’ cos Kt — K r sin Kt). Both transforms are equivalent to 
s[(s + l) 2 + x' 2 )]x = 1. 

15.30 With y = A(x) sin(x/2) + B(x) cos(x/2), obtain A'(z) = 2/(z)cos(z/2) and B\z) = 
— 2f(z) sin(z/2) and hence identify G(x,z). 

15.31 Use continuity and the step condition on dG/dt at t = t 0 to show that 
G(t, fo) = a _I {l — exp[a(fo — f)]} for 0 < to < f; 

x(t) = A(a — [1 — exp(— of)] — or 1 [1 — exp(— af)]}. 

15.32 (a) B + x = f y dz[A — 2 f’ f(u)du]~^ 2 ; (b) show that the force is proportional 
to the derivative of (xo — x) 2 ln[xo/(xo — x)] ; x = xo{l — exp[— At 2 /(2m)]}. 

15.33 LHS of the equation is exact for two stages of integration and then needs an 
integrating factor expx; 2y (fy/dx 2 + 2y dy/dx + 2(dy/dx) 2 ; 2 ydy/dx + y 2 = 
d(y 2 )/dx + y 2 ; y 2 = A exp(— x) + Bx + C — (sinx — cosx)/2. 

15.34 Set p = dy/dx; y(x) = Ax 3 /18 — B In x + Cx + D. 

15.35 Follow the method of subsection 15.2.6; u(x) = e~ xl and v(x ) satisfies v" +4v = 
sin2x, for which a particular integral is (— xcos2x)/4. The general solution 
y(x) = [A sin2x + (B — 1.x) cos2x]e~'~. 

15.36 Set p = dy/dx and follow subsection 15.3.2 to obtain pd 2 p/dy 2 + 1 = ( dp/dy ) 2 
and then set q = dp/dy to obtain (q 2 — 1) 1/2 = Ap. The substitution sinhd = Ap 
gives finally that cosech (Ay + B) + coth(/ly + B) = e~ x . 

15.37 Equation is isobaric with y of weight w, where m + p — 2 = mn; v(x) satisfies 
x 2 v" + xv 1 = v". Set x = e‘ and v(x) = u(t), leading to u" = u" with u(0) = 
0, u'(0) = X. Multiply both sides by u' to make the equation exact. 
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16 


Series solutions of ordinary 
differential equations 


In the previous chapter the solution of both homogeneous and non-homogeneous 
linear ordinary differential equations (ODEs) of order > 2 were discussed. In par- 
ticular we developed methods for solving some equations in which the coefficients 
were not constant but functions of the independent variable x. In each case we 
were able to write the solutions to such equations in terms of elementary func- 
tions, or as integrals. In general, however, the solutions of equations with variable 
coefficients cannot be written in this way, and we must consider alternative 
approaches. 

In this chapter we discuss a method for obtaining solutions to linear ODEs 
in the form of convergent series. Such series can be evaluated numerically, and 
those occurring most commonly are named and tabulated. There is in fact no 
distinct borderline between this and the previous chapter, since solutions in terms 
of elementary functions may equally well be written as convergent series (i.e. the 
relevant Taylor series). Indeed, it is partly because some series occur so frequently 
that they are given special names such as sin x, cos x or exp x. 

Since we shall be concerned principally with second-order linear ODEs in this 
chapter, we begin with a discussion of these equations, and obtain some general 
results that will prove useful when we come to discuss series solutions. 


16.1 Second-order linear ordinary differential equations 

Any homogeneous second-order linear ODE can be written in the form 

y" + P(x)y' + q(x)y = 0, (16.1) 

where y' = dy/dx and p(x) and q(x) are given functions of x. From the previous 
chapter, we recall that the most general form of the solution to (16.1) is 

y{x) = cjyi(x) + c 2 y 2 (x ), (16.2) 
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where yi(x) and y 2 (x) are linearly independent solutions of (16.1), and c i and C 2 
are constants that are fixed by the boundary conditions (if supplied). 

A full discussion of the linear independence of sets of functions was given 
at the beginning of the previous chapter, but for just two functions y i and y 2 
to be linearly independent we simply require that y 2 is not a multiple of y\. 
Equivalently, y \ and y 2 must be such that the equation 


ci.vi(x) + c 2 y 2 (x) = 0 


is only satisfied for c\ = c 2 = 0. Therefore the linear independence of yi(x) and 
y 2 (x) can usually be deduced by inspection but in any case can always be verified 
by the evaluation of the Wronskian of the two solutions. 


W(x) = 


y i k2 
y'i y'i 


= yiy'i - yiy'v 


(16.3) 


If W (x) ^ 0 anywhere in a given interval then y\ and y 2 are linearly independent 
in that interval. 

An alternative expression for W{x), of which we will make use later, may be 
derived by differentiating (16.3) with respect to x to give 


W = yty 2 + yWi - yiy'i - yWi = viy 2 - y"y 2 . 


Since both y \ and y 2 satisfy (16.1), we may substitute for y" and y" t0 obtain 

W' = -yi(M + qyi) + ( py'i + qyi)yi = -p(yiy' 2 - y\yi) = ~pW. 


Integrating, we find 


W(x) = C exp 



(16.4) 


where C is a constant. We note further that in the special case p(x) = 0 we obtain 
W = constant. 


► The functions y\ = sinx and y 2 = cosx are both solutions of the equation y" +y = 
0. Evaluate the Wronskian of these two solutions, and hence show that they are linearly 
independent. 


The Wronskian of yi and y 2 is given by 

Ik = yu 2 — yiy'\ = — sin 2 x — cos 2 x = —1. 

Since W =f 0 the two solutions are linearly independent. We also note that y" + y = 0 is 
a special case of (16.1) with p(x) = 0. We therefore expect, from (16.4), that W will be a 
constant, as is indeed the case. ◄ 

From the previous chapter we recall that, once we have obtained the general 
solution to the homogeneous second-order ODE (16.1) in the form (16.2), the 
general solution to the inhomogeneous equation 

y" + p(x)V + q(x)y = /(x) (16.5) 
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can be written as the sum of the solution to the homogeneous equation y c (x) 
(the complementary function) and any function y p (x) (the particular integral) that 
satisfies (16.5) and is linearly independent of y c (x). We have therefore 

y{x) = ci.vi(x) + c 2 yi(x) + y p (x). (16.6) 

General methods for obtaining y p , that are applicable to equations with variable 
coefficients, such as the variation of parameters or Greens functions, were dis- 
cussed in the previous chapter. An alternative description of the Green’s function 
method for solving inhomogeneous equations is given in the next chapter. For the 
present, however, we will restrict our attention to the solutions of homogeneous 
ODEs in the form of convergent series. 

16.1.1 Ordinary and singular points of an ODE 

So far we have implicitly assumed that y(x) is a real function of a real variable 
x. However, this is not always the case, and in the remainder of this chapter we 
broaden our discussion by generalising to a complex function y(z) of a complex 
variable z. 

Let us therefore consider the second-order linear homogeneous ODE 

y" + p(z)y' + q(z) = 0, (16.7) 

where now y' = dy/dz; this is a straightforward generalisation of (16.1). A full 
discussion of complex functions and differentiation with respect to a complex 
variable z is given in chapter 20, but for the purposes of the present chapter we 
need not concern ourselves with many of the subtleties that exist. In particular, 
we may treat differentiation with respect to z in an way analogous to ordinary 
differentiation with respect to a real variable x. 

In (16.7), if at some point z = zo the functions p(z) and q(z ) are finite and can 
be expressed as complex power series (see section 4.5) 

00 00 

P (z) = X Pn{z - Zo)", q(z) = X “ “ 0 )' ! 

n=0 n=0 

then p(z) and q(z) are said to be analytic at z = zo, and this point is called an 
ordinary point of the ODE. If, however, p(z) or q(z), or both, diverge at z = zo 
then it is called a singular point of the ODE. 

Even if an ODE is singular at a given point z = zo, it may still possess a 
non-singular (finite) solution at that point. In fact the necessary and sufficient 
condition! f° r such a solution to exist is that (z— zo)p(z) and (z — z 0 ) 2 q(z) are both 
analytic at z = zo- Singular points that have this property are regular singular 


( See, for example, Jeffreys and Jeffreys, Mathematical Methods of Physics, 3rd ed. (Cambridge 
University Press, 1966), p. 479. 
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points, whereas any singular point not satisfying both these criteria is termed an 
irregular or essential singularity. 


► Legendre’s equation has the form 

(1 - z 2 )y" - 2zy' + /fif + 1 )v = 0, (16.8) 

where t is a constant. Show that z = 0 is an ordinary point and z = +1 are regular singular 
points of this equation. 


Firstly, divide through by 1 — z 2 to put the equation into our standard form (16.7): 


2 .- / + 


1 — z 2 


1 — z 2 


Comparing this with (16.7), we identify p(z) and q(z) as 

, , — 2z — 2z + 1) /(/+ 1) 

PiZ) ~ 1-z 2 “ (l+z)(l-z)’ q{Z) ~ 1- z 2 ~ (1 +z)(l — z)' 

By inspection, p(z) and q(z) are analytic at z = 0, which is therefore an ordinary point, 
but both diverge for z = +1, which are thus singular points. However, at z = 1 we see 
that both (z — l)p(z) and (z — l) 2 q(z) are analytic and hence z = 1 is a regular singular 
point. Similarly, at z = — 1 both (z + 1 )p(z) and (z + 1 ) 2 q(z) are analytic, and it too is a 
regular singular point. ◄ 


So far we have assumed that z 0 is finite. However, we may sometimes wish to 
determine the nature of the point |z[ — > oo. This may be achieved straightforwardly 
by substituting w = 1/z into the equation and investigating the behaviour at 
w = 0. 


► S/iow that Legendre’s equation has a regular singularity at \z\ — * oo. 


Letting w = 1/z, the derivatives with respect to z become 
dy dy dw 1 dy 2 dy 

dz dw dz z 2 dw dw' 

S ^ ( Pi = -* 2 (~^ - w 2 ^) = w 3 + 

dz z dz dw \dz) \ dw dw A j V dw dw - 


If we substitute these derivatives into Legendre’s equation (16.8) we obtain 


1 w 2 


2— + w^-^\ + 7 — w 2 — 
dw dw 2 J 


+ 2 — w z — - — |- + l)j; — 0, 

w dw 


which simplifies to give 


w 2 (w 2 - 1)^4 + 2w 3 d f+W+ 1 )y = 0 . 
dw 1 dw 

Dividing through by w 2 (w 2 — 1) to put the equation into standard form, and comparing 
with (16.7), we identify p(w ) and q(w) as 


p(w) 


2 w 

w 2 — 1 ’ 


q{w) 


d(/+l) 

w 2 (w 2 — 1) 


At w = 0, p(w) is analytic but q(w) diverges, and so the point |z| — * oo is a singular point 
of Legendre’s equation. However, since wp and w 2 q are both analytic at w = 0, |z| — > oo 
is a regular singular point. ◄ 
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Equation 

Regular singularities 

Essential singularities 

Legendre* 

(1 - z 2 )y" - 2 zy' +<?(/+ 1 )y = 0 
Chebyshev 

—1, 1,00 

— 

( 1 — z 2 )y" — zy' + try = 0 

—1, 1,00 

— 

Bessel 



z 2 y" + zy + {z 2 - v 2 )y = 0 

0 

00 

Laguerre* 

zy" + (1 -z)y' + ay = 0 

Simple harmonic oscillator 

0 

00 

y" + o) 2 y = 0 

— 

00 

Hermite 



y" - 2 zy' + lay = 0 

— 

oo 


Table 16.1 Important ODEs in the physical sciences and engineering. The 
asterisks indicate that the corresponding associated equations (discussed in the 
next chapter) have the same singular points. 


Table 16.1 lists the singular points of several second-order linear ODEs that 
play important roles in the analysis of many physics and engineering problems. 
In sections 16.6 and 16.7 we consider the the solution of Legendre’s and Bessel’s 
equations in terms of convergent series and discuss some useful properties of 
these solutions. The solutions of the remaining equations in table 16.1 may also 
be found in the form of convergent series, but a discussion of these solutions and 
their properties is left until the next chapter, where they are considered in the 
context of Sturm-Liouville systems. We now discuss the methods by which series 
solutions may be obtained. 


16.2 Series solutions about an ordinary point 

If z = zo is an ordinary point of (16.7) then it may be shown that every solution 
y(z) of the equation is also analytic at z = zq. In our subsequent discussion 
we will take zo as the origin, i.e. zo = 0. If this is not already the case, then a 
substitution Z = z — zo will make it so. Since every solution is analytic, y(z) can 
be represented by a power series of the form (see section 20.13) 

00 

y(z) = ^2 a n z n - (16.9) 

n = 0 

Moreover, it may be shown that such a power series converges for \z\ < R. where 
R is the radius of convergence and is equal to the distance from z = 0 to the 
nearest singular point of the ODE (see chapter 20). At the radius of convergence, 
however, the series may or may not converge (as shown in section 4.5). 


541 



SERIES SOLUTIONS OF ORDINARY DIFFERENTIAL EQUATIONS 


Since every solution of (16.7) is analytic at an ordinary point, it is always 

possible to obtain two independent solutions (from which the general solution 

(16.2) can be constructed) of the form (16.9). The derivatives of y with respect to 
z are given by 

00 00 

y' = na n z n ~ l = y ~~](n + l)«„ + iz", (16.10) 

n = 0 n = 0 

00 00 

y" = n(n — l)a„z"~ 2 = y ~^(n + 2)(n + l)a n+ 2 Z n . (16.11) 

«=0 n =0 

Note that, in each case, in the first equality the sum can still start at n = 0 since 
the first term in (16.10) and the first two terms in (16.11) are automatically zero. 
The second equality in each case is obtained by shifting the summation index so 
that the sum can be written in terms of coefficients of z n . By substituting (16.9) 
(16.11) into the ODE (16.7), and requiring that the coefficients of each power of 
z sum to zero, we obtain a recurrence relation expressing a„ as a function of the 
previous a r (0 < r < n — 1). 


►find the series solutions, about z = 0. of 

y" + y = o. 


By inspection z = 0 is an ordinary point of the equation, and so we may obtain two 
independent solutions by making the substitution y = • Using (16.9) and (16.11) 

we find 

00 00 

y~y» + 2 )(n + 1 )a n+2 z" + a„z" = 0, 

n= 0 n=0 

which may be written as 

00 

^ ^ [{n + 2 )(n + l)a u +2 + an\z n = 0. 

n= 0 


For this equation to be satisfied we require that the coefficient of each power of z vanishes 
separately, and so we obtain the two-term recurrence relation 


0,1+2 (n + 2)(n+l) 


for n > 0. 


Using this relation, we can calculate, say, the even coefficients 02 , « 4 , «6 and so on, for 
a given a 0 . Alternatively, starting with a u we obtain the odd coefficients a 3 , a 5 etc. Two 
independent solutions of the ODE can be obtained by setting either ag = 0 or ai = 0. 
Firstly if we set ai = 0 and choose a 0 = 1 then we obtain the solution 


• V1(S) = 1 “ + 4! 


00 


E 


(2 n)\ 


Secondly, if we set ag = 0 and choose a 1 = 1 then we obtain a second, independent, solution 


~3 

y 2 (z) = + ^ 


00 


E 


(-D" -2„ + l 

(2n + 1) ! ' 
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Recognising these two series as cos z and sin z, we can write the general solution as 

y(z) = c i cos z + C 2 sin z, 

where ci and c 2 are arbitrary constants that are fixed by boundary conditions (if supplied). 
We note that both solutions converge for all z, as might be expected since the ODE 
possesses no singular points (except |z| — > oo). ◄ 

Solving the above example was quite straightforward and the resulting series 
were easily recognised and written in closed form (i.e. in terms of elementary 
functions); this is not usually the case. Another simplifying feature of the previous 
example was that we obtained a two-term recurrence relation relating a n+ 2 and 
a n , so that the odd- and even-numbered coefficients were independent of one 
another. In general the recurrence relation expresses a„ as a function of any 
number of the previous a,. (0 < r < n — 1 ). 


► Find the series solutions, about z = 0, of 


(1 -z) : 


r y = 0. 


By inspection z = 0 is an ordinary point, and therefore we may find two independent 
solutions by substituting y = a nZ n - Using (16.10) and (16.11), and multiplying through 
by (1 — z) 2 , we find 

00 OO 

( 1 — 2z + z 2 ) ^ n(n — 1 )a„z"~ 2 — 2 a n z" = 0, 

n= 0 n= 0 

which leads to 

OO 00 00 00 

n(n — 1 )a„z"~ 2 — 2 ^ n(n — 1 )a„z "~ 1 + n(n — 1 )a„z n — 2 a„z " = 0. 

n= 0 n=0 n= 0 n= 0 

In order to write all these series in terms of the coefficients of z", we must shift the 
summation index in the first two sums, obtaining 

00 00 OO 

y^(n + 2 )(n + 1 )a n+2 z" — 2 + 1 )na n+1 z" + ^^(n 2 —n — 2 )a n z" = 0, 

n=0 n= 0 n=0 

which can be written as 

00 

y^(n + l)[(n + 2)fl„ +2 — 2na n+l +(n — 2)a„]z" = 0. 

n= 0 

By demanding that the coefficients of each power of z vanish separately, we obtain the 
three-term recurrence relation 

(n + 2)«„ +2 — 2na n+ i + (n — 2 )a„ = 0 for n > 0, 

which determines a„ for n > 2 in terms of ao and ai. Three-term (or more) recurrence 
relations are a nuisance and, in general, can be difficult to solve. This particular recurrence 
relation, however, has two straightforward solutions. One solution is a„ = ao for all n , in 
which case (choosing a 0 = 1) we find 

, , 1 

yi{z) = 1 + z + z +z +••• = . 
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The other solution to the recurrence relation is a i = — 2uo, ai = ao and a„ = 0 for n > 2, 
so that (again choosing a 0 = 1) we obtain a polynomial solution to the ODE: 

yi{z) = 1 — 2z + z 2 = (1 — z) 2 . 

The linear independence of y t and y 2 is obvious but can be checked by computing the 
Wronskian 

W = yiy' 2 — y\yi = ~^—[-2{l-z)] - 1 (1 -z) 2 = -3. 

i — z (1 — z ) z 

Since W =f= 0 the two solutions y\ and y 2 are indeed linearly independent. The general 
solution of the ODE is therefore 

y(z) = + c 2 ( 1 - z) 2 . 

1 — z 

We observe that y i (and hence the general solution) is singular at z = 1, which is the 
singular point of the ODE nearest to z = 0, but the polynomial solution y 2 is valid for all 
finite z. ◄ 

The above example illustrates the possibility that, in some cases, we may find 
that the recurrence relation leads to a„ = 0 for n > N, for one or both of the 
two solutions; we then obtain a polynomial solution to the equation. Polynomial 
solutions are discussed more fully in section 16.5, but one obvious property of 
such solutions is that they converge for all finite z. By contrast, as mentioned 
above, for solutions in the form of an infinite series the circle of convergence 
extends only as far as the singular point nearest to that about which the solution 
is being obtained. 


16.3 Series solutions about a regular singular point 

From table 16.1 we see that several of the most important second-order linear 
ODEs in physics and engineering have regular singular points in the finite complex 
plane. We must extend our discussion, therefore, to obtaining series solutions to 
ODEs about such points. In what follows we assume that the regular singular 
point about which the solution is required is at z = 0, since, as we have seen, if 
this is not already the case then a substitution of the form Z = z — zo will make 
it so. 

If z = 0 is a regular singular point of the equation 

y" + p{z)y' + q(z)y = o 

then p(z) and q(z) are not analytic at z = 0, and in general we should not expect 
to find a power series solution of the form (16.9). We must therefore extend the 
method to include a more general form for the solution. In fact it may be shown 
(Fuch’s theorem) that there exists at least one solution to the above equation, of 
the form 

00 

y = z"5>z», (16.12) 

71=0 
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where the exponent a is a number that may be real or complex and where «o ^ 0 
(since, if it were otherwise, a could be redefined as a + 1 or a + 2 or • • • so as to 
make no ^ 0)- Such a series is called a generalised power series or Frobenius series. 
As in the case of a simple power series solution, the radius of convergence of the 
Frobenius series is, in general, equal to the distance to the nearest singularity of 
the ODE. 

Since z = 0 is a regular singularity of the ODE, it follows that zp(z) and z 2 q(z ) 
are analytic at z = 0, so that we may write 

00 

zp(z) = s(z) = s n z n 

n = 0 

00 

z 2 q(z) = t (z) = t n z n , 

n = 0 


where we have defined the analytic functions s(z) and t(z) for later convenience. 
The original ODE therefore becomes 


+ ' 4 , _ o. 

Z Z z 


Let us substitute the Frobenius series (16.12) into this equation. The derivatives 
of (16.12) with respect to x are given by 


y = y~](n + o)a n z n+a (16.13) 

n = 0 

oo 

y" = y ~^(w + a)(n + a — 1 )a n z n+a ~ 2 , (16.14) 

n = 0 


and we obtain 

00 00 00 

y^(n + cr){n + a — 1 )a n z n+a ^ 2 + s(z) y ^(n + o)a n z n+a ~ 2 + t(z) ^ a n z ,l+a ~ 2 = 0. 

n=0 n= 0 n= 0 


Dividing this equation through by z a 2 we find 

00 

y [(n + a )(n + a — 1) + s(z)(n + a) + f(z)] a„z n = 0. (16.15) 

n = 0 

Setting z = 0, all terms in the sum with n > 0 vanish, implying that 


[cr(cr — 1) + s(0)er + f(0)]n 0 = 0, 


which, since we require ciq =/= 0, yields the indicia l equation 


cr((T — 1) + s(0)er + t(0) = 0. 


(16.16) 


This equation is a quadratic in a and in general has two roots, the nature of 
which determines the forms of possible series solutions. 
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The two roots of the indicial equation cq and <r 2 are called the indices of the 
regular singular point. By substituting each of these roots into (16.15) in turn and 
requiring that the coefficients of each power of z vanish separately, we obtain a 
recurrence relation (for each root) expressing each a„ as a function of the previous 
a r (0 < r < n — 1). Depending on the roots of the indicial equation oq and a 2 , 
there are three possible general cases, which we now discuss. 


16.3.1 Distinct roots not differing by an integer 

If the roots of the indicial equation oq and <t 2 differ by an amount that is not 
an integer then the recurrence relations corresponding to each root lead to two 
linearly independent solutions of the ODE, 

00 00 

yi(z) = z ai y a n z '\ yi(z) = y b ” z "- 

n = 0 n = 0 

The linear independence of these two solutions follows from the fact that yi/yi 
is not a constant since oq — <r 2 is not an integer. Because jq and y 2 are linearly 
independent, we may use them to construct the general solution y = cqjq + c 2 y 2 . 

We also note that this case includes complex conjugate roots where o 2 = oq, 
since oq — o 2 = 01 — crj" = 2 i 1m oq cannot be equal to a real integer. 


►Find the power series solutions about z = 0 of 

4 zy" + 2 y' + y = 0. 


Dividing through by 4z to put the equation into standard form, we obtain 

y " + i y, + i y=0 ’ (16 - 17) 

and on comparing with (16.7) we identify p(z) = l/(2z) and q(z ) = 1/(4 z). Clearly z = 0 
is a singular point of (16.17), but since zp(z) = 1/2 and z 2 q(z ) = z/4 are finite there, it 
is a regular singular point. We therefore substitute the Frobenius series y = z a a„z n 
into (16.17). Using (16.13) and (16.14), we obtain 

00 1 00 1 00 

y'/n + <r)(n + a — 1 )fl„z" +<T_2 + — y (n + o)a n z n+a ~ 1 + — y a n z n+a = 0, 

n = 0 n=0 n = 0 

which on dividing through by z°~ 2 gives 

00 

y [(n + o)(n + (7 — 1) + l(n + a) + \z\ a„z" = 0. (16.18) 

h=o 

If we set z = 0 then all terms in the sum with n > 0 vanish, and we obtain the indicial 
equation 

a (a — 1 ) + = 0 , 

which has roots a = 1/2 and (7 = 0. Since these roots do not differ by an integer we expect 
to find two independent solutions to (16.17), in the form of Frobenius series. 
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Demanding that the coefficients of z" vanish separately in (16.18), we obtain the 
recurrence relation 

(n + a)(n + a — l)a„ + j(n + a)a n + \a„_ i = 0. (16.19) 

If we choose the larger root, a = 1/2, of the indicial equation then (16.19) becomes 


(4 n 2 + 2 n)a„ + a„_i = 0 


tin — 1 

2n(2n + 1) 


Setting a 0 = 1 we find a n = (— l)"/(2n + 1)! and so the solution to (16.17) is 


yi(z) = 


(-D n 

(2 n+l)\‘ 


( V ?) 3 | ( V ?) 5 

3! 5! 


To obtain the second solution we set a = 0 (the smaller root of the indicial equation) in 
( 16.19), which gives 


(4n 2 — 2n)a„ + n„_i = 0 


ttn — i 

2n(2n — 1 ) ’ 


Setting a 0 
(16.17') is 


1 now gives a„ = (— l)"/(2n)!, and so the second (independent) solution to 


>’2(z) = 


(-1)" 
(2n)! ■ 


z” = 1 - 


(V*) 2 | (V4) 4 


2 ! 


4! 


— ■ ■ ■ = cos , 


We may check that yi(z) and yi(-) are indeed linearly independent by computing the 
Wronskian 


W = yiy' 2 - yiy'i 


= sin V z sin V^) - cos f~z cc 

= -TUz ( sin2 V 2 + cos2 V 2 ) = -3-77 + °' 


271 v v ' 271 

yi(z) and y 2 (2) are linearly ind 

y 

y(z) = sin 7^ + G cos Jz. ◄ 


Since IT =/= 0 the solutions yi(z) and y 2 (z) are linearly independent. Hence the general 
solution to (16.17) is given by 


16.3.2 Repeated root of the indicial equation 

If the indicial equation has a repeated root, so that 01=02 = <r, then obviously 
only one solution in the form of a Frobenius series (16.12) may be found as 
described above, i.e. 

00 

yi(z) = 

n = 0 

Methods for obtaining a second, linearly independent, solution are discussed in 
section 16.4. 
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16.3.3 Distinct roots differing by an integer 

Whatever the roots of the indicial equation, the recurrence relation corresponding 
to the larger of the two always leads to a solution of the ODE. However, if the 
roots of the indicial equation differ by an integer then the recurrence relation 
corresponding to the smaller root may or may not lead to a second linearly 
independent solution, depending on the ODE under consideration. Note that for 
complex roots of the indicial equation, the ‘larger’ root is taken to be the one 
with the larger real part. 


►Fmd the power series solutions about z = 0 of 

z(z — 1 )y" + 3 zy' + y = 0. ( 16.20) 


Dividing through by z(z — 1) to put the equation into standard form, we obtain 


/ + 


3 




1 


(z-1)' z(z-l) 


y = 0, 


(16.21) 


and on comparing with (16.7) we identify p(z) = 3/(z — 1) and q(z) = l/[z(z — 1)]. We 
immediately see that z = 0 is a singular point of (16.21), but since zp(z) = 3 z/(z — 1) and 
z 2 q(z) = z/(z— 1) are finite there, it is a regular singular point and we expect to find at least 
one solution in the form of a Frobenius series. We therefore substitute y = z° Yl™=o a n z n 
into (16.21), and using (16.13) and (16.14), we obtain 


oo ~ CO 

y^(n + o)(n + a - 1 )a„z" +1T ~ 2 + y]( n + (j)a n z n+tt _1 

n = 0 n = 0 


+ 


l 

z(z- 1) 


Y a » z ' ,+c 

n = 0 


= o, 


which on dividing through by z a 2 gives 


OO 


E 


(n + o)(n + a — 1) + 



(n + a) + 



a n z" = 0. 


Although we could use this expression to find the indicial equation and recurrence relations, 
the working is simpler if we now multiply through by z — 1 to give 


[(z — l)(f! + o)(n + a — 1) + 3 z(n + a) + z] a n z n = 0. (16.22) 

tt =0 

If we set z = 0 then all terms in the sum with the exponent of z greater than zero vanish, 
and we obtain the indicial equation 

o(o — 1 ) = 0 , 

which has the roots a = 1 and (7 = 0. Since the roots differ by an integer (unity), it may not 
be possible to find two linearly independent solutions of (16.21) in the form of Frobenius 
series. We are guaranteed, however, to find one such solution corresponding to the larger 
root, (7 = 1. 

Demanding that the coefficients of z" vanish separately in (16.22), we obtain the 
recurrence relation 

(n — 1 + ct )(n — 2 + (7)a„-i — (n + o)(n + a — l)a„ + 3 (n — 1 + (r)a„_ \ + a„_! = 0, 
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which can be simplified to give 


(n + a — l)a„ = (n + c)a„_i. 
Substituting a = 1 into this expression, we obtain 

' n + 1 \ 

a„- 1 , 

n ) 

and setting a 0 = 1 we find a„ = n+ 1; so one solution to (16.21) is 

00 

yi(z) = zY+n + 1 )z" = z(l + 2z + 3z 2 4 ) 

n=0 

z 


(16.23) 


(1-z) 2 ' 


(16.24) 


If we attempt to find a second solution (corresponding to the smaller root of the indicial 
equation) by setting a = 0 in (16.23), we find 


n — 1 


^ @n—lj 


but we require a 0 i= 0, so is formally infinite and the method fails. We discuss how to 
find a second linearly independent solution in the next section. ◄ 


One particular case is also worth mentioning. If the point about which the 
solution is required, i.e. z = 0, is in fact an ordinary point of the ODE rather than 
a regular singular point, then substitution of the Frobenius series (16.12) leads to 
an indicial equation with roots er = 0 and a = 1. Although these roots differ by 
an integer (unity), the recurrence relations corresponding to the two roots yield 
two linearly independent power series solutions (one for each root), as expected 
from section 16.2. 

It is always worth investigating whether a series found as a solution to a 
problem is summable in closed form or expressible in terms of known functions. 
Nevertheless, the reader should avoid gaining the impression that this is always 
so or that, if one worked hard enough, a closed-form solution could always be 
found without using the series method. As mentioned earlier, this is not the case, 
and very often an infinite series solution is the best one can do. 


16.4 Obtaining a second solution 

Whilst attempting to find a solution to an ODE in the form of a Frobenius series 
about a regular singular point, we found in the previous section that when the 
indicial equation has a repeated root, or roots differing by an integer, we can (in 
general) find only one solution of this form. In order to construct the general 
solution to the ODE, however, we require two linearly independent solutions y i 
and y 2 . We now consider several methods for obtaining a second solution in this 
case. 
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16.4.1 The Wronskian method 


If y i and yj are two linearly independent solutions of the standard equation 

y" + p{z)y' + q(z)y = o 

then the Wronskian of these two solutions is given by W(z ) = y\y' 2 — y 2 y\. 
Dividing the Wronskian by y\ we obtain 

w y' 2 /, .V 2 , 

— — vf2 — h 

yf y i y\ yi 

which integrates to give 

, . , . r w{ U ) 

V 2 (z) = Tl(z) J -y2^ dU - 

Now using the alternative expression for W(z) given in (16.4) with C = 1 (since 
we are not concerned with this normalising factor), we find 

y 2 (z) = Jh(z) J exp J p(v)dv^ du. (16.25) 

Hence, given yi, we can in principle compute y 2 . Note that the lower limits of 
integration have been omitted. If constant lower limits are included then they 
merely lead to a constant times the first solution. 


± 1 
dz lyi 


y 2 = 


n 
dz V yi 


►find a second solution to (16.21 ) using the Wronskian method. 


For the ODE (16.21) we have p(z) = 3/(z — 1), and from (16.24) we see that one solution 
to (16.21) is yi = z/( 1 — z) 2 . Substituting for p and yi in (16.25) we have 




(1 -H 2 

Z 

z 

(l-H 2 

z 

(l-H 2 


du 


— — exp [—3 ln(u — 1)] du 

II 


f 

r z u - 1 

J u 2 


lnz + 


du 


By calculating the Wronskian of y i and y 2 it is easily shown that, as expected, the two 
solutions are linearly independent. In fact, as the Wronskian has already been evaluated 
as W(u) = exp[— 3 ln(u — 1)], i.e. W (z) = (z — 1)~ 3 , no calculation is needed. ◄ 


An alternative (but equivalent) method of finding a second solution is simply to 
assume that the second solution has the form y 2 (z) = w(z)yi(z) for some function 
u(z) to be determined (this method was discussed more fully in subsection 15.2.3). 
From (16.25), we see that the second solution derived from the Wronskian is 
indeed of this form. Substituting y 2 (z) = w(z)yi(z) into the ODE leads to a 
first-order ODE in which u' is the dependent variable; this may then be solved. 
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16.4.2 The derivative method 

The derivative method of finding a second solution begins with the derivation of 
a recurrence relation for the coefficients a„ in a Frobenius series solution, as in the 
previous section. However, rather than putting a = o\ in this recurrence relation 
to evaluate the first series solution, we now keep er as a variable parameter. This 
means that the computed a n are functions of a and the computed solution is now 
a function of z and a : 

00 

y(z,o) = z a ^2a n (o)z n . (16.26) 

n = 0 

Of course, if we put a = a\ in this, we obtain immediately the first series solution, 
but for the moment we leave a as a parameter. 

For brevity let us denote the differential operator on the LHS of our standard 
ODE (16.7) by C, so that 

d 2 d 

c = d? +fi:) Tz + ‘ l{z) ’ 

and examine the effect of C on the series y(z, a) in (16.26). It is clear that the 
series Cy{z, a) will contain only a term in z a , since the recurrence relation defining 
the «„(er) is such that these coefficients vanish for higher powers of z. But the 
coefficient of z a is simply the LHS of the indicial equation. Therefore, if the roots 
of the indicial equation are a = a\ and er = o 2 then it follows that 

Cy(z,a) = a 0 (ff — cri )(cr — a 2 )z a . (16.27) 

Therefore, as in the previous section, we see that for y(z,a) to be a solution of 
the ODE Cy = 0, er must equal ff\ or a 2 . For simplicity we shall set uo = 1 in the 
following discussion. 

Let us first consider the case in which the two roots of the indicial equation 
are equal, i.e. a 2 = ff\ ■ From (16.27) we then have 

Cy(z,a) = (<x - cri ) 2 z a . 

Differentiating this equation with respect to a we obtain 

S- [Cy{z,a)\ = (a — o { ) 2 z a In z + 2(er — <j\ )z a . 

Off 

which equals zero if er = <n . But since d /da and C are operators that differentiate 
with respect to different variables we can reverse their order, implying that 

y(z, a) = 0 at a = ff\. 

Hence the function in square brackets, evaluated at er = a\ and denoted by 

(16.28) 
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is also a solution of the original ODE Cy — 0, and is in fact the second linearly 
independent solution for which we were looking. 

The case in which the roots of the indicial equation differ by an integer is 
slightly more complicated but can be treated in a similar way. In (16.27), since C 
differentiates with respect to z we may multiply (16.27) by any function of cr, say 
o — o 2 , and take this function inside the operator C on the LHS to obtain 

C [(o - a 2 )y(z, a)] = (o - cy )(<r - o 2 fz a . (16.29) 

Therefore the function 

[(o - o 2 )y(z,o)\ a=<T2 

is also a solution of the ODE Cy = 0. However, it can be provedf that this 
function is a simple multiple of the first solution y(z,o i), showing that it is not 
linearly independent and that we must find another solution. To do this we 
differentiate (16.29) with respect to er and find 

^ {£ [(cr - o 2 )y(z,o)}} = (o - o 2 ) 2 z a + 2(o - cri)(cr - o 2 )z a 

+ (er — o l )(<7 — o 2 ) 2 z a In z, 

which is equal to zero if cr = o 2 . As previously, since d/do and C are operators 
that differentiate with respect to different variables, we can reverse their order to 
obtain 

= 0 at o = o 2 , 

and so the function 

^-^[{o -o 2 )y(z,o)]^ (16.30) 

is also a solution of the original ODE Cy = 0, and is in fact the second linearly 
independent solution. 

►find a second solution to ( 16.21 ) using the derivative method. 

From (16.23) the recurrence relation (with o as a parameter) is given by 

(n + a — l)a„ = (n + o)a„- 1 . 

Setting a 0 = 1 we find that the cofficients have the particularly simple form a„(o) = 
(cr + n)/o. We therefore consider the function 

00 00 

y(z,cj) = z a '^2a, 1 (o)z” = z a ^ 

11=0 11=0 


M ^ [(cr — o- 2 )y(zr, cr)] 


f For a fuller discussion see, for example, Riley, Mathematical Methods for the Physical Sciences , 
(Cambridge University Press, 1974), pp. 158-9. 
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The smaller root of the indicial equation for (16.21) is a 2 = 0, and so from (16.30) a 
second, linearly independent, solution to the ODE is 


\ 8a 


[ay(z,a)} 



^2(a + n)z" 

n = 0 


The derivative with respect to a is given by 



8_ 

da 


z"5> + n)z" 

n = 0 

which on setting a = 0 gives the second solution 

00 00 
y 2 (z) = lnz5>" + X: 


= z' 7 In z + n)z" + z "> 


n=0 n = 0 

z 1 

lnz + 


(1-z) 2 

z 


1 — z 

In z + - — 1 ) . 

z 


This second solution is the same as that obtained by the Wronskian method in the previous 
subsection except for the addition of some of the first solution. ◄ 


16.4.3 Series form of the second solution 


Using any of the methods discussed above, we can find the general form of the 
second solution to the ODE. This form is most easily found, however, using the 
derivative method. Let us first consider the case where the two solutions of the 
indicial equation are equal. In this case a second solution is given by (16.28), 
which may be written as 


yi(z) = 


dy(z,a) 

da 


= (In z)z a ' ^ «„(< Ti)z” + z ai ^ 

n = 0 n= 1 

00 

= yi(z)lnz +z ai T br,z n . 


da n (c r) 


da 


n= 1 


where b„ = [da n {a)/da] a=CTl . 

In the case where the roots of the indicial equation differ by an integer (not 
equal to zero), then from (16.30) a second solution is given by 


yi(z) = 



[{a -a 2 )y(z, a)] 


l <7=(72 


lnz 

00 

(a - a 2 )z a y a n (a)z n 

00 

' d 

— (a-a 2 )a„(a) 
d(7 


n=0 

0=02 71=0 
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But, as we mentioned in the previous section, [(<7 — a 2 )y(z,<r)\ at a = <72 is just a 
multiple of the first solution y(z, <n). Therefore the second solution is of the form 

00 

y 2 (z) = cyi(z) In z + z ai ^ b n z n , 

n = 0 

where c is a constant. In some cases, however, c might be zero and so the second 
solution would not contain the term in lnz and could be written simply as a 
Frobenius series. Clearly this corresponds to the case in which the substitution of 
a Frobenius series into the original ODE yields two solutions automatically. 


16.5 Polynomial solutions 

We have seen that the evaluation of successive terms of a series solution to a 
differential equation is carried out by means of a recurrence relation. The form 
of the relation for a„ depends upon n, the previous values of a r (r < n) and the 
parameters of the equation. It may happen, as a result of this, that for some 
value of n = N + 1 the computed value ujv+i is zero and that all higher a r also 
vanish. If this is so, and the corresponding solution of the indicial equation er 
is a positive integer or zero, then we are left with a Unite polynomial of degree 
N’ = N + er as a solution of the ODE : 

1 v 

y(z) = a„z n+a . (16.31) 

n=0 

In many applications in theoretical physics (particularly in quantum mechanics) 
the termination of a potentially infinite series after a finite number of terms 
is of crucial importance in establishing physically acceptable descriptions and 
properties of systems. The condition under which such a termination occurs is 
therefore of considerable importance. 


►Find power series solutions about z = 0 of 

y" - 2zy' + Xy = 0. (16.32) 

For what values of X does the equation possess a polynomial solution? Find such a solution 
for X = 4. 


Clearly z = 0 is an ordinary point of (16.32) and so we look for solutions of the form 
y = a n~"- Substituting this into the ODE and multiplying through by z 2 we find 

00 

[n(n — 1 ) — 2 z 2 n + Xz 2 ]a„z" = 0. 

n=0 

By demanding that the coefficients of each power of z vanish separately we derive the 
recurrence relation 

n(n — 1 )a„ — 2(n — 2 )n „_2 + 2a„_ 2 = 0, 
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which may be rearranged to give 

2(n-2)-A 

a„ = a „_ 2 lor n > 2. (16.33) 

fl(/! — 1) 

The odd and even coefficients are therefore independent of one another, and two solutions 
to (16.32) may be derived. We either set = 0 and a 0 = 1 to obtain 

yi(z) = 1 - 2^ - 2(4 - 2)^ - 2(4 - 2)(8 - 2)^ - - • (16.34) 

or set a 0 = 0 and «[ = 1 to obtain 

y 2 (z) = z + (2 — 2)— + (2 — 2)(6 — 2)— + (2 — 2)(6 — 2)(10 — 2)— + • • • . 

Now from the recurrence relation (16.33) (or in this case from the expressions for yi 
and y 2 themselves) we see that for the ODE to possess a polynomial solution we require 
2 = 2(n — 2) for n > 2 or more simply 2 = 2 n for n > 0, i.e. 2 must be an even positive 
integer. If 2 = 4 then from (16.34) the ODE has the polynomial solution 

yi(z) = 1 - = 1-2 z 2 . ◄ 

A simpler method of obtaining finite polynomial solutions is to assume a 
solution of the form (16.31), where a m f 0. Instead of starting with the lowest 
power of z, as we have done up to now, this time we start by considering the 
coefficient of the highest power z N ; such a power now exists because of our 
assumed form of solution. 


► By assuming a polynomial solution find the values of 2 in ( 16.32 ) for which such a solution 
exists. 


We assume a polynomial solution to (16.32) of the form y = J2^ =0 a n z". Substituting this 
form into (16.32) we find 

N 

[; n(n — 1 )a n z"~ 2 — 2zna„z n ~ 1 + Aa n z n \ = 0. 

n = 0 

Now, instead of starting with the lowest power of z, we start with the highest. Thus, 
demanding that the coefficient of z N vanishes, we require —2 N + 2 = 0, i.e. 2 = 2N, as we 
found in the previous example. By demanding that the coefficient of a general power of z 
is zero, the same recurrence relation as above may be derived and the solutions found. ◄ 


16.6 Legendre’s equation 

In previous sections we have discussed methods for obtaining series solutions of 
second-order linear ODEs. In this section and the next we apply some of these 
methods to finding the series solutions of the two most important equations listed 
in table 16.1, namely Legendre’s equation and Bessel’s equation. As mentioned 
earlier, the remaining equations in table 16.1 may also be solved by the methods 
discussed in this chapter. These equations, and the properties of their solutions, 
are discussed briefly in the next chapter. 
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We now consider Legendre’s equation 

(1 - z 2 )y" - 2zy' + /(/+!)>' = 0, (16.35) 


which occurs in numerous physical applications and particularly in problems with 
axial symmetry when they are expressed in spherical polar coordinates. In normal 
usage the variable z in Legendre’s equation is the cosine of the polar angle in 
spherical polars, and thus — 1 < z < 1. The parameter { is a given real number, 
and any solution of (16.35) is called a Legendre function. 

In subsection 16.1.1, we showed that z = 0 is an ordinary point of (16.35), and 
so we expect to find two linearly independent solutions of the form y = Y^n= o a n z ”- 
Substituting, we find 
00 

[n(n — l)a„z" -2 — n(n — 1 )a„z" — 2na„z” + £(f + l)a„z"] = 0, 

n = 0 

which on collecting terms gives 

00 

T, {(« + 2 )(n + \)a n+2 — [n(n + 1) — {{{ + l)]n„} z n = 0. 

n = 0 


The recurrence relation is therefore 

[n(n +1) — /(/ +1)] 

a n+2 . pp | pr a i 

(n + 1 )(n + 2) 


(16.36) 


for n = 0, 1,2, If we choose a 0 = 1 and = 0 then we obtain the solution 

Tt(z) = 1 — £(f + l)|y + - 2YV + w + 3)|y - • • • , (16.37) 


whereas choosing ao = 0 and a\ = 1 we find a second solution 

y 2 (z) = z-(S- W + 2)|, +(t- 3)(t - W + 2W + 4)2l - • • • . (16.38) 

By applying the ratio test to these series (see subsection 4.3.2), we find that both 
series converge for \z\ < 1, and so their radius of convergence is unity, which 
(as expected) is the distance to the nearest singular point of the equation. Since 
(16.37) contains only even powers of z and (16.38) contains only odd powers, 
these two solutions cannot be proportional to one another, and are therefore 
linearly independent. Hence y = c\}’] + C 2 yi is the general solution to (16.35) for 
|z| < L 


16.6.1 General solution for integer t 


Now, if t is an integer in Legendre’s equation (16.35), i.e. f = 0, 1,2, ..., then the 
recurrence relation (16.36) gives 


W+2 


Y(t + l)-W+l)] 

Y + W + 2 ) 


ai — 0 , 
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Figure 16.1 The first four Legendre polynomials. 


i.e. the series terminates and we obtain a polynomial solution of order t . These 
solutions (suitably normalised) are called Legendre polynomials of order / ; they 
are written P({z) and are valid for all finite z. It is conventional to normalise 
TV(z) in such a way that P/(l) = 1, and as a consequence P/(— 1) = (— if. The 
first few Legendre polynomials are easily constructed and are given by 


P 0 (z) = 1 Pfz) = z 

P 2 (z) = ±(3 z 2 - 1) P 3 (z) = i(5 z 3 - 3z) 

P 4 (z) = g(35z 4 - 30z 2 + 3) P s (z) = ^(63z 5 - 70z 3 + 15z). 


The first four Legendre polynomials are plotted in figure 16.1. 

According to whether i is an even or odd integer respectively, either >’i (z) 
in (16.37) or y 2 (z) in (16.38) terminates to give a multiple of the corresponding 
Legendre polynomial P({z). In either case, however, the other series does not 
terminate and therefore converges only for |z| < 1. According to whether / is 
even or odd we define Legendre functions of the second kind as Qr(z) = a^y 2 (z) 
or Qf(z) = /i/yi(z) respectively, where the constants v.{ and ff are conventionally 
taken to have the values 


(— 1 ) / / 2 2 4 [(// 2)!] 2 
J\ 

(_ l ) R + 1 )/ 2 2 ^- 1 {[(/_ 1 )/ 2]!} 2 


for { even, 
for i odd. 


(16.39) 

(16.40) 
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These normalisation factors are chosen so that the Qfiz) obey the same recurrence 
relations as the P?(z) (see subsection 16.6.2). 

The general solution of Legendre’s equation for integer £ is therefore 

y(z) = ciPf(z) + c 2 Q/(A (16.41) 

where Pf(z) is a polynomial of order £, and so converges for all z, and Q{(z) is 
an infinite series that converges only for \z\ < l.f 

By using the Wronskian method, section 16.4, one may obtain closed forms for 
the Q ( (z). 


► Use the Wronskian method to find a closed-form expression for Qo(z). 


From (16.25) a second solution to Legendre’s equation (16.35), with £ = 0, is 

/ z j / pu 2^ \ 

MuW exp [J T^f dv j du 

= I exp [— ln( 1 — u 2 )] du 


-f 


du 


= l\n 


(1 -u 2 ) 2 \l-zj ’ 


1 + z 


(16.42) 


where in the second line we have used the fact that Po(z) = 1. 

All that remains is to adjust the normalisation of this solution so that it agrees with 
(16.39). Expanding the logarithm in (16.42) as a Maclaurin series we obtain 

z 3 z 5 

yi(z) = z + y + y H ■ 

Comparing this with the expression for Qo{z), using (16.38) with £ = 0 and the normali- 
sation (16.39), we find that y 2 (z) is already correctly normalised, and so 


Qo(z) = In 



Of course, we might have recognised the series (16.38) for £ = 0, but to do so for larger £ 
would prove progressively more difficult. ◄ 


Using the above method for £ = 1, we find 


Q 



1 + Z 

1 — z 


1 . 


Closed forms for higher-order Q/(z) may now be found using the recurrence 
relation (16.55) derived in the next subsection. 


f It is possible, in fact, to find a second solution in terms of an infinite series of negative powers of 
z that is finite for |z| > 1. 
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16.6.2 Properties of Legendre polynomials 

As stated earlier, when encountered in physical problems the variable z in Leg- 
endre’s equation is usually the cosine of the polar angle 6 in spherical polar 
coordinates, and we then require the solution y(z ) to be regular at z = +1, which 
corresponds to 9 = 0 or 6 = n. For this to occur we require the equation to have 
a polynomial solution, and so { must be an integer. Furthermore, we also require 
the coefficient cj of the function Q/(z) in (16.41) to be zero, since Q/(z) is singular 
at z = +1, with the result that the general solution is simply some multiple of the 
relevant Legendre polynomial P/(z). In this section we will study the properties 
of the Legendre polynomials Pf(z) in some detail. 


Rodrigues’ formula 

As an aid to establishing further properties of the Legendre polynomials we now 
develop Rodrigues’ representation of these functions. Rodrigues’ formula for the 
F/(z) is 


p ' w = W!5? (2! - 1) '- < 1643 > 

To prove that this is a representation we let u = (z 2 — 1 )^, so that u’ = 2/z(z 2 — 1) /_1 
and 


(z 2 — 1 )u — 2 fzu = 0. 


If we differentiate this expression i + 1 times using Leibnitz’ theorem, we obtain 
[(z 2 - l)u (W) + 2 z(f + l)n ,2+1) + + 1 )m (/) ] - 2{ [zm (/+1) + (/ + 1 )u (<0 ] = 0, 

which reduces to 

(z 2 - l)n (/+2) + 2z« (/+1) - ({( + l)u {0 = 0. 


Changing the sign all through and comparing the resulting expression with 
Legendre’s equation (16.35), we see that u ^ satisfies the same equation as Pf(z), 
and so 


U ( %) = c e Pf(z), (16.44) 

for some constant C{ that depends on A To establish the value of cy we note 
that the only term in the expression for the Ah derivative of (z 2 — if that 
does not contain a factor z 2 — 1 , and therefore does not vanish at z = 1 , is 
(2z)V!(z 2 — 1)°. Putting z = 1 in (16.44) and recalling that P^( 1) = 1 , therefore 
shows that cy = 2ft\, thus completing the proof of Rodrigues’ formula (16.43). 
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► Use Rodrigues' formula to show that 

I ‘ = f_ 1 P ‘( z)p ‘( z)dz = 2, + r 

(16.45) 


The result is trivially obvious for / = 0 and so we assume { > 1. Then, by Rodrigues' 
formula, 


h = 


1 

2 2 f /!) 2 


f 1 

-d f (z 2 -iy- 

'dfz 2 - iy 

L 

dz e 

dz e 


dz. 


Repeated integration by parts, with all boundary terms vanishing, reduces this to 

(-i Y 


h = 


d 2f 

2 


2 2e ({\) 2 , 

(2 1)\ 
2 U (/!) 2 , 


y-vjav-v* 

ft 

,2 V . 


(1 -z z Y dz. 


-1 


If we write 


K,= J (1 -z 2 )' dz. 


then integration by parts (taking a factor 1 as the second part) gives 


K { = J 2/z 2 (l — z 2 f 1 dz. 


Writing 2/z 2 as 2/ — 2/(1 — z 2 ) we obtain 

K e = 2/ J (1 - Z 2 /- 1 dz - 2/ J (1 - z 2 Y dz 

= 2 - 2/‘K f 

and hence the recurrence relation (2/ + 1 )K ( = 2tK(-\ . We therefore find 

2 2M (t\f 


K= JLJ£^1 2 „ _*£!_ 

e 2/+12/-1 3 0 '(2/+1)!" (2/ + 1)! ’ 


-2 = 


which, when substituted into the expression for I e , establishes the required result. ◄ 


Mutual orthogonality of Legendre polynomials 
Another useful property of the PAz) is their mutual orthogonality, i.e. that 


Pt(z)PAz)dz = 0 if / f k. 


(16.46) 


More general considerations concerning the mutual orthogonality of solutions to 
various classes of second-order linear ODEs are discussed in the next chapter, 
but for the moment we concentrate on the specific proof of (16.46). 

Since the PA Z ) satisfy Legendre’s equation we may write 


[d-z 2 )p;]' ' + /(/ + ufv = o, 
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where P' ( = dP^/dz. Multiplying through by P k and integrating from z = — 1 to 
z = 1, we obtain 

j P k [(1 - z 2 )P'\ ' dz + j P k t(t + 1 )P ( dz = 0. 

Integrating the first term by parts and noting that the boundary contribution 
vanishes at both limits because of the factor 1 — z 2 , we find 

- J P k (\ - z 2 )P' dz+ I P k /'(/' + \ )P( dz = 0. 

Now, if we reverse the roles of t and k and subtract one expression from the 
other, we conclude that 

[k(k + 1) - + 1 )] J P k P( dz = 0, 

and therefore since k f= ( we must have the result (16.46). As a particular case 
we note that if we put fc = 0 we obtain 

J P f {z) dz — 0 for { ^ 0. 

As will be discussed more fully in the next chapter, the mutual orthogonality of 
the P/{z) means that any reasonable function /(z) (i.e. one obeying the Dirichlet 
conditions discussed at the start of chapter 12) can be expressed in the interval 
|z| < 1 as an infinite sum of Legendre polynomials, 


/(z) = ^ a f P k (z). 


/=o 


where the coefficients a f are given by 

2/+1 

a f = — s, — 


f(z)P f (z)dz. 


(16.47) 


(16.48) 


>-Prove the expression (16.48) for the coefficients in the Legendre polynomial expansion 
of a function f{z). 


If we multiply (16.47) by P m (z) and integrate from z = — 1 to z = 1 then we obtain 

/ I «> ,.1 

P„,(z)f(z)dz = P m (z)P f (z)dz 

1 /= q J - 1 

= a m [ P m (z)P m (z)dz = - a , 

J - i 2 m + 1 

where we have used the orthogonality property (16.46) and the normalisation property 
(16.45). ◄ 
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Generating function for Legendre polynomials 

A useful device for manipulating and studying sequences of functions or quantities 
labelled by an integer variable (here, the Legendre polynomials Pf{z) labelled by 
f) is a generating function. The generating function has perhaps its greatest utility 
in the area of probability theory (see chapter 26). However, it is also a great 
convenience in our present study. 

The generating function for, say, a series of functions f„(z) for n = 0, 1, 2, . . . is 
a function G(z,h ), containing as well as z a dummy variable h, such that 

00 

G(z,h) = ^2f n (z)h n , 

n = 0 

i.e. f n (z) is the coefficient of If in the expansion of G in powers of h. The utility 
of the device lies in the fact that sometimes it is possible to find a closed form 
for G(z,h). 

For our study of Legendre polynomials let us consider the functions P n (z) 
defined by the equation 

00 

G(z,h) = (1 - 2 zh + h 2 r 1/2 = p ^ hn - (16.49) 

n = 0 

As we show below, the functions so defined are identical to the Legendre poly- 
nomials and the function (1 — 2 zh + h 2 ) -1 / 2 is in fact the generating function for 
them. In the process we will also deduce several useful relationships between the 
various polynomials and their derivatives. 

In the following dP n (z)/dz will be denoted by Pf Firstly, we differentiate the 
defining equation (16.49) with respect to z to get 

h( 1 - 2zh + h 2 )~ y2 = p y ■ (16.50) 

Also, we differentiate (16.49) with respect to h to yield 

(z - h)( 1 - 2 zh + h 2 r 3/2 = nP nh n ~' ; (16.51) 

equation (16.50) can then be written using (16.49) as 

h J2 p nh" = ( 1 - 2 zh + lr)J2 p n h ' 

and thus equating coefficients of h n+l we obtain the recurrence relation 

P„ = K +l - 2zP' n + P'_ v (16.52) 

Equations (16.50) and (16.51) can be combined as 

(z-h)J2 P n h " =h^nP n h n -\ 

from which the coefficent of If yields a second recurrence relation 

zP' n ~ P'_, = nP„; (16.53) 
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eliminating P'_j between (16.52) and (16.53) then gives the further result 

(n + 1)P„ = P' +1 — zP„. (16.54) 

If we now take the result (16.54) with n replaced by n— 1 and add z times 
(16.53) to it then we obtain 

(1 - z 2 )P' n = n(P„_i - zP n ); 

finally, differentiating both sides with respect to z and using (16.53) again, we 
find 

( 1 - z 2 )P: - 2zP' n = n[(P'_ x - zP' n ) - P„] 

= n(—nP„ — P n ) = — n(n + 1)P,„ 

and so the P„ defined by (16.49) do indeed satisfy Legendre’s equation. 

It remains only to verify the normalisation. This is easily done at z = 1, when 
G becomes 

G(l,h) = [(1 — /i) 2 ] 1/2 = 1 + h + h 2 + ■ ■ ■ , 

and we can see that all the P„ so defined have P„(l) = 1 as required. Many other 
useful recurrence relations can be derived from those found above. 

► Proue the recurrence relation 

( n + l)P„+i — (2 n + 1 )zP„ + fiP„-i = 0. (16.55) 

Substituting from (16.49) into (16.51) we find 

(z - h) J2 Pnh" = (1 - 2 zh + lr)J2 nP n h n -\ 

Equating coefficients of h" we obtain 

zP„ - P„_ i = (n + l)P„ + i - 2 znP„ + {n— l)P„_i, 
which on rearrangment gives the stated result. ◄ 

Another use of the generating function (16.49) is in representing the inverse 
distance between two points in three-dimensional space in terms of Legendre 
polynomials. If two points r and r' are at distances r and r' respectively from the 
origin, with r' < r, then 

1 _ 1 

|r — r'| (r 2 + r' 2 — 2rr' cos 6) 1 / 2 

1 

r [1 — 2(r'/r) cos 9 + (r'/r) 2 ] L 2 
1 _°°. / /\ f 

= :E - Pf(cos d), (16.56) 

r /=o ^ r ' 

where 0 is the angle between the two position vectors r and r'. If r' > r, however, 
then r and r' must be exchanged in (16.56) or the series would not converge. 
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To summarise the situation concerning Legendre polynomials, we now have 
three possible starting points, which have been shown to be equivalent: the 
defining equation (16.35) together with the condition P„(l) = 1; Rodrigues’ 
formula (16.43); and the generating function (16.49). In addition we have proved 
a variety of relationships and recurrence relations (not particularly memorable, 
but collectively useful) and, as will be apparent from the work of chapter 18, 
have developed a powerful tool for use in axially symmetric situations in which 
the V 2 operator is involved and spherical polar coordinates are employed. 


16.7 Bessel’s equation 

Bessel’s equation arises from physical situations similar to those involving Legen- 
dre’s equation but when cylindrical, rather than spherical, polar coordinates are 
employed. It has the form 

z 2 y" + zy' + (z 2 — v 2 )y = 0, (16.57) 

where the parameter v is a given number, which we may take as > 0 with no loss 
of generality. In Bessel’s equation, z is usually a multiple of a radial distance and 
therefore ranges from 0 to oo. 

Writing (16.57) in our standard form we have 

/ + + (i - h) y = o- (16 - 58) 

By inspection z = 0 is a regular singular point; hence we try a solution of the 
form y = z a a n z "• Substituting this into (16.58) and multiplying the resulting 

equation by z 2_<7 , we obtain 

OO OO 

y [(cr + n)(a + n — 1) + (a + n) — v 2 ] a n z n + y a„z n+2 = 0, 

n = 0 n =0 

which simplifies to 

OO OO 

y [(er + n) 2 — v 2 ] a n z n + y a n z n+1 = 0. 

12=0 12=0 

Considering coefficients of z° we obtain the indicial equation 

er 2 — v 2 = 0, 

and so er = +v. For coefficients of higher powers of z we find 

[(< r + l) 2 - v 2 ] ai = 0, 

[(cr + n ) 2 — v 2 ] a n + a„_ 2 = 0 for n > 2. 
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Substituting a = +v into (16.59) and (16.60) we obtain the recurrence relations 

(l + 2v)a 1 =0, (16.61) 

n(n + 2 v)a„ + a„_ 2 = 0 for n > 2. (16.62) 

We consider now the form of the general solution to Bessel’s equation (16.57) for 
two cases, the case for which v is not an integer and that for which it is (including 
zero). 


16.7.1 General solution for non-integer v 

If v is a non-integer then in general the two roots of the indicial equation, (j\ = v 
and a 2 = — v, will not differ by an integer, and we may obtain two linearly 
independent solutions in the form of Frobenius series. Special considerations do 
arise, however, when v = m/2 for m = 1,3,5,..., and o\ — o 2 = 2v = m is an 
(odd positive) integer. When this happens, we may always obtain a solution in 
the form of a Frobenius series corresponding to the larger root <n = v = m/2, 
as described above. For the smaller root o 2 = — v = —m/2, however, we must 
determine whether a second Frobenius series solution is possible by examining 
the recurrence relation (16.62), which reads 

n(n — m)a n + a n ~ 2 = 0 for n > 2. 

Since m is an odd positive integer in this case, we can use this recurrence relation 
(starting with ao f 0) to calculate a 2 , 04,05,... in the knowledge that all these 
terms will remain finite. It is possible in this case, therefore, to find a second 
solution in the form of a Frobenius series corresponding to the smaller root <r 2 . 
Thus, in general, for non-integer v we have from (16.61) and (16.62) 

a n = — * 2 for n = 2, 4, 6, ... , 

n(n + 2v) 

= 0 for n = 1,3,5, 

Setting a 0 = 1 in each case, we obtain the two solutions 

z 2 z 4 1 

2(2 ±2v) + 2 x 4(2±2v)(4±2v) 

It is customary, however, to set 

1 

“ 2±'T(1 + v)’ 

where T(x) is the gamma function, described in the appendix; it may be regarded 
as the generalisation of the factorial function to non-integer and/or negative 
arguments.! The two solutions of (16.57) are then written as J,,(z) and J_ v (z), 


y±v( z ) = z ±v 


f In particular, T(n + 1) = n! for n = 0, 1,2,. . ., and T(ji) is infinite if n is any integer < 0. 


565 



SERIES SOLUTIONS OF ORDINARY DIFFERENTIAL EQUATIONS 


where 

mi— ' ( Z Y I", 1 /z\ 2 1 1 f z \ 4 

v(Z) ~ T(v + 1) \2/ [ _ v + 1 V2 ) + (v + l)(v +2)2! UJ 

^ (—1)" /zy+2 n 

n!T(v + n + 1) V2/ 


(16.63) 


replacing v by — v gives J v (z). The functions J„(z ) and ./_ v (z) are called Bessel 
functions of the first kind, of order v. Since the first term of each series is a 
finite non-zero multiple of z v and z -v respectively, if v is not an integer then 
J v (z) and J-fiz) are linearly independent. This may be confirmed by calculating 
the Wronskian of these two functions. Therefore, for non-integer v the general 
solution of Bessel’s equation (16.57) is 


y(z) = ciJ v (z) + c 2 J- v (z). 


(16.64) 


►Find the general solution of 

z 2 y" + zy' + (z 2 - \)y = 0. 


This is Bessel’s equation with v = 1/2, so from (16.64) the general solution is simply 

y(z) = ci Ji /2 (z) + c 2 J- i /2 (z). 

However, Bessel functions of half-integral order can be expressed in terms of trigonometric 
functions. To show this, we note from (16.63) that 


7±i/ 2 (z) = z±V^. 


(-l)"z 2 


^ 22»±l/2 n! T(l + „ + 1) 

tt=0 2. ' 

Using the fact that T(x + 1) = xV{x) and r(^) = fin we find that, for v = 1/2, 

iW 1 


J i/ 2 (z) — 


T(|) 


iw 2 qz) 9/2 

1!T(|) 2!T(|) 


(lz) 5/2 


+ 


az> 9/2 


(\)fin U{\)(\)fin 2!(f )(\)(\)fin 


(W 12 A 

CfifiTt V 


Z 2 Z 4 _ 

3! + 5! 


( ^z) 1/2 ! 

{\)fin 


sinz, 


whereas for v = 


-1/2 we obtain 

(izU 1/2 


7-1/ 2 (z) : 


r(j) 

(br 1/2 


(^zp + Uz ) 7 / 2 


i!r(|) 


2!T(|) 


, z 2 z 4 
1 “ 2T + 4T ^ ' 


Therefore the general solution we require is 


y{z) = c l Ji /2 {z) + c 2 J-i/ 2 (z) =ci\l — sinz + c 2 


— cosz. ◄ 

7TZ 
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Corresponding to the discussion in subsection 16.6.2 of the general solution 
of Legendre’s equation, we note that when Bessel’s equation is encountered in 
physical situations the argument z is usually some multiple of a radial distance 
and so takes values in the range 0 < z < oo. We often require that the solution 
is regular at z = 0 but, from (16.63), we see immediately that J_ v (z) is singular 
at the origin (remember that we restricted v to be non-negative). In such cases, 
the coefficient cz in (16.64) must be set to zero, and the solution is simply some 
multiple of J,,(z). 


16.7.2 General solution for integer v 

The definition of the Bessel function J,.(z) given in (16.63) is, of course, valid for 
all values of v but, as we shall see, in the case of integer v the general solution of 
Bessel’s equation cannot be written in the form (16.64). Firstly let us consider the 
case v = 0, so that the two solutions to the indicial equation are equal, and we 
clearly obtain only one solution in the form of a Frobenius series. From (16.63), 
this is given by 

00 ' ^ l)"z 2 " 

= H 2 2n n\Y(l+n) 

n=0 y ’ 

z 2 Z 4 z 6 

1 ? 2 + 2 2 4 2 2 2 4 2 6 2 + " ' ' 

In general, however, if v is a positive integer then the solutions of the indicial 
equation differ by an integer. For the larger root, esq = v, we may find a solution 
J v (z) for v = 1, 2, 3, . . . , in the form of a Frobenius series given by (16.63). Graphs 
of Jo(z), Ji(z) and Ji(z) are plotted in figure 16.2 for real z. For the smaller root 
cq = —v, however, the recurrence relation (16.62) becomes 


n(n — m)a n + a„__ 2 = 0 for n > 2, 

where m — 2v is now an even positive integer, i.e. m = 2,4,6, Starting with 

flo ^ 0 we may then calculate ai, a. 4 , a^, . . . , but we see that when n = m the 
coefficient a n is formally infinite, and the method fails to produce a second 
solution in the form of a Frobenius series. 

In fact, by replacing v by — v in the definition of J„(z) given in (16.63), it can 
be shown that, for integer v, 

T-v(z) = (-l)’7 v (z) 


and hence that J,,(z) and J_ v (z ) are linearly dependent. So, in this case, we cannot 
write the general solution to Bessel’s equation in the form (16.64). One therefore 
defines the function 


Tv(z) = 


J v (z) COS V 71 — T_ v (z) 
sin v 7i 


(16.65) 
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Figure 16.2 The first three integer-order Bessel functions. 


which is called a Bessel’s function of the second kind of order v. As Bessel’s equa- 
tion is linear, Y v (z) is clearly a solution, since it is just the weighted sum of Bessel 
functions of the first kind. Furthermore, for non-integer v it is clear that Y v (z) is 
linearly independent of J v (z). It may also be shown that the Wronskian of J,,(z) 
and Yy(z) is non-zero for all values of v. Flence J v (z) and Y v (z) always constitute 
a pair of independent solutions. The expression (16.65) becomes an indeterminate 
form 0/0 when v is an integer, however. This is so because for integer v we have 
cosv7r = ( — 1 ) v and /_ v (z) = (— l) v J v (z). Nevertheless, this indeterminate form 
can be evaluated using l’Hopital’s rule (see chapter 4). Thus for integer v we set 


Yv(z) = lim 


J ;i (z) cos /m — J-^(z) 
sin fin 


(16.66) 


which gives a linearly independent second solution for integer v. Therefore, we 
may write the general solution of Bessel’s equation, valid for all v, as 


y(z) = ciJ v (z) + c 2 Yv(z). (16.67) 

As mentioned above for the case when v is not an integer, in physical situations 
we often require the solution of Bessel’s equation to be regular at z = 0. But, 
from its definition (16.65) or (16.66), it is clear that Y v (z) is singular at the origin, 
and so in such physical situations the coefficient ci in (16.67) must be set to zero; 
the solution is then simply some multiple of J v (z). 


16.7.3 Properties of Bessel functions 

Bessel functions of the first and second kind, J v (z) and T v (z), have various useful 
properties that are worthy of further discussion. 
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Recurrence relations 

The recurrence relations enjoyed by Bessel functions of the first kind, J v (z), can 
be derived directly from the power series definition (16.63). 


► Prore the recurrence relation 


^-[z v J,,(z)] = z v J,,_i(z). 
dz 

(16.68) 


From the power series definition (16.63) of J,,(z) we obtain 

d rvwo d ^ (-1 )"z 2v+2 " 

dz L ~ vU,J dz ^ 2 V + 2 "/? !E(v + n + 1) 

n = 0 

/ ^n^2v+2n— 1 

2 v+2»— l n !r(v + ;j) 

_ v (-l)-'z(v-l) + 2n _ y 

Z^26-i)+2'>n!r((v-l) + n+l) “ ,_1 “ ' 

It may similarly be shown that 

^-[z~ v J v (z)\ = -z _v J v+ i(z). (16.69) 

dz 

From (16.68) and (16.69) the remaining recurrence relations may be easily derived. 
Expanding out the derivative on the LFIS of (16.68) and dividing through by z v_1 
we obtain the relation 



zi'(z) + vJ v (z) = zJ v _i(z). (16.70) 

Similarly, by expanding out the derivative on the LHS of (16.69), and multiplying 
through by z v+1 , we find 


zJ'(z) - vJ r (z) = — zJ v+ i(z). (16.71) 

Adding (16.70) and (16.71) and dividing through by z gives 

^v-t(z) — J v +i(z) = 2J'(z). (16.72) 

Finally, subtracting (16.71) from (16.70) and dividing by z gives 

J v— t(z ) + Jv+i(z) = — Jv(z). (16.73) 

z 
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► Giren that J 1 / 2 ( 2 ) = (2/nz) 1 ^ 2 sinz and that J_ 1 / 2 ( 2 ) = (2/nz) 1 ^ 2 cos z, express J 3 / 2 U) 
and J_ 3 / 2 ( 2 ) in terms of trigonometric functions. 


From (16.71) we have 


•^3/2(z) = ^Ti/ 2 (z) - j; /2 (z) 

1/2 


2z V 7IZ 


1/2 


1/2 


sinz — cosz 


1 f 2 

cosz H — 

2 z \ nz 


1/2 


Similarly, from (16.70), we have 


1 


J-3/i(z) — ^^77_i/ 2 (z) + J(_ 1/2 (z) 


1/2 

= — — ( I COSZ — I 

. nz 


-- cosz — smz 
z 


1 

2z V nz 

7X1/2 
nz 


1/2 


smz - — 

2z \ nz 


1/2 


We shall see that, by repeated use of these recurrence relations, all Bessel functions J v (z) 
of half-integer order may be expressed in terms of trigonometric functions. From their 
definition (16.65), Bessel functions of the second kind, Y,,(z), of half-integer order can be 
similarly expressed. ◄ 


Finally, we note that the relations (16.68) and (16.69) may be rewritten in 
integral form as 


J z'7 v _i(z)dz = z”J v (z) 

J z~ v J v+ i(z) dz = — z“ v J v (z). 


If v is an integer, the recurrence relations of this section may be proved using 
the generating function for Bessel functions discussed below. It may be shown 
that Bessel functions of the second kind, Y v (z), also satisfy the recurrence relations 
derived above. 


Mutual orthogonality of Bessel functions 

Bessel functions of the first kind, J v (z), possess an orthogonality relation analo- 
gous to that of the Legendre polynomials discussed in subsection 16.6.2. A more 
general discussion of the mutual orthogonality of solutions to second-order linear 
ODEs (such as Bessel’s equation) is given in chapter 17. 

By definition, the function J„(z) satisfies Bessel’s equation (16.57), 

z 2 y" + zy' + (z 2 — v 2 )y = 0. 
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Let us instead consider the functions /(z) = J v (Xz) and g(z) = J v {)iz), which, as 
will be proved below, respectively satisfy the equations 

z 2 f" + zf + (X 2 z 2 - v 2 )f = 0, (16.74) 

z 2 g" + zg' + (/' 2 z 2 - v 2 )g = o. (16.75) 


► S/jow that f(z) = J,,(Xz) satisfies ( 16.74). 


If f(z) = J v (Xz) and we write w = Xz, then 


df _ dJ v (w) 
dz dw 


and d ll X 

dz 2 dw 1 


When these expressions are substituted, the LHS of (16.74) becomes 

2 2 d 2 J v (w) dJ v (w) 2 2 2\ j i . 

z a — — h zX — Y(X z — v )J v (w) 


dw 2 


dw 


2 d 2 J v (w) dJfiw) 2 2 1 . / 

= w — ^ h w — |-(w —v )J r (w). 


dw 2 dw 

But, from Bessel’s equation itself, this final expression is equal to zero, thus verifying that 
f(z) does satisfy (16.74). ◄ 


Now multiplying (16.75) by /(z) and (16.74) by g(z) and subtracting them gives 
d 


dz 


[z(/g' - g/')] = (A 2 - /r)z/g, 


(16.76) 


where we have used the fact that 

j~ z W/g' - g/')] = z(/g" - gf") + ifg' - gf')- 
By integrating (16.76) over any given range z = a to z = b we obtain 


zf(z)g(z)dz = 


1 


z/(z)g'(z)-zg(z)/'(z) 


l b 


X 2 - /< 2 L 

which, on setting /(z) = J v (Az) and g(z) = J v {)iz), becomes 

1 


zJ v (Az)J v (/liz) dz = 


fir - /< 2 


i b 


lizJ v (Xz)j' v (i.iz) — X zj r (fiz)j' v (lz) 

(16.77) 

If X fi /(, and the interval [a, b] is such that the expression on the RHS of (16.77) 
equals zero then we obtain the orthogonality condition 

f zJ v (Xz)J v (/iz) dz = 0. (16.78) 


This happens, for example, if J Y (Xz ) and J v ()iz) vanish at z = a and z = b, or if 
J' v {Xz ) and J'fijiz) vanish at z = a and z = h, or for many more general conditions. 

If X = /(, however, then the RHS of (16.77) takes the indeterminant form 0/0. 
This may be evaluated using fHopital’s rule, or alternatively we may calculate 
the relevant integral directly. 
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Ignoring the integration limits for the moment. 


/ J 2 (Xz)z dz = J J 2 (u)udu, 

where u = Xz. Integrating by parts yields 

/ = j jj(u)udu = \u 2 J 2 (u) — J J v (u)J' v (u)u 2 du. 

Now Bessel’s equation (16.57) can be rearranged as 

u 2 J v (u) = v 2 J v (u) — uj' v (u) — u 2 J"(u), 
which, on substitution into the expression for /, gives 

/ = \u 2 J 2 (u) — J J' v (u)[v 2 J v (u) — uJ' v (u) — u 2 J"(u)] du 
= 5 u 2 J 2 (u ) - \v 2 J 2 (u ) + \u 2 [J' v (u )] 2 + c. 

Since u = Xz the required integral is given by 

l J^z)zdz=\ (z 2 - ^ J 2 (Xz) + z 2 [J[,(Xz)] 2 , (16.79) 

which gives the normalisation condition for Bessel functions of the first kind. ◄ 

Since the Bessel functions J v (z) possess the orthogonality property (16.78) we 
may expand any reasonable function / ( z ) (i.e. one obeying the Dirichlet conditions 
discussed in chapter 12) in the interval 0 < z < a as a sum of Bessel functions of 
a given order v, 

00 

f{z) = ^c„J v (l„z), (16.80) 

n = 0 

where the X n are chosen such that J v (X n a ) = 0. The coefficients c n are then given 
by 

2 

a 2 J^ + Mn a ) 


f(z)J v (X„z)z dz. 


( 16 . 81 ) 
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► Prove the expression (16.81) for the coefficients in a Bessel function expansion of a 
function f(z). 


If we multiply (16.80) by zJ,(A m z) and integrate from z = 0 to z = a then we obtain 

pa 00 pa 

/ zJ v (l m z)f(z)dz = y^c n I zJ v (A m z)J v (X n z) dz 
Jo „ =0 Jo 

= C m / j;(f n z)z dz 
Jo 

2 o m a J v ( )- m a ) 2 c tn a J v + \ (2 m a), 

where in the last two lines we have used (16.77), (16.79), the fact that J v (f„a) = 0 and 
(16.71). ◄ 


Generating function for Bessel functions 

The Bessel functions J v (z), where v is an integer, can be described by a gener- 
ating function in a similar way to that discussed for Legendre polynomials in 
subsection 16.6.2. The generating function for Bessel functions of integer order is 
given by 

00 

= J2 (16.82) 

n =— oo 

By expanding the exponential as a power series, it is straightfoward to verify that 
the functions J„(z) defined by (16.82) are indeed Bessel functions of the first kind. 

The generating function (16.82) is useful for finding, for Bessel functions of 
integer order, properties which can often be extended to the non-integer case. In 
particular, the Bessel function recurrence relations may be derived. 


G(z,h ) = exp 




► Use the generating function ( 16.82 ) to prove, for integer v, the recurrence relation ( 16.73 ), 
i.e. 

2v 

Jr-l(z) + J,+l(z) = —J,(Z). 

Z 


Differentiating G(z,h ) with respect to h we obtain 


8G(z,h) z 
8h = 2 


f 1 + w) = ^2 nJ ^ z l h " 

' ' n= — m 


which can be written using (16.82) again as 

/ 1 \ 00 00 

1 + 72 E J "^ h " = Y. 


Equating coefficients of h" we obtain 

^[J„(z) + J n+2 (z)] = (n + l)J„ + i(z), 

which on replacing n by v — 1 gives the required recurrence relation. ◄ 
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The generating function (16.82) is also useful in deriving the integral represen- 
tation of Bessel functions of integer order. 



By expanding out the cosine term in the integrand in (16.83) we obtain the integral 

1 r 

/ = — / [cos(z sind) cosud + sin(z sin 8) sinfid] dd. (16.84) 

K Jo 

Now, we may express cos(zsind) and sin(zsind) in terms of Bessel functions by setting 
h = exp id in (16.82) to give 

00 

exp ^(exp id — exp(— i'd))J = exp (iz sind) = ^ J m (z)expimd. 

m =— oo 

Using de Moivre's theorem exp id = cos 8 + i sin 6 we then obtain 

00 

exp(iz sind) = cos(z sin 8) + i sin(z sind) = ^ J m (z)(cos mO + i sin mO). 

m =— oo 

Equating the real and imaginary parts of this expression we find 

00 

cos(zsind) = ^ J m (z) cos md, 

m =— oo 


00 

sin(zsind) = ^ J m (z)sinmd. 

m =— oo 

Substituting these expressions into (16.84) we find 



[J,„(z) cosmd cos nd + J, n (z) sinmd sinfid] dd. 


However, using the orthogonality of the trigonometric functions, see equations (12.1) 
(12.3), we obtain 

1 n 

I = -7z[Jn(z) + J„{z)] = Jn(z), 

71 1 

which proves the integral representation (16.83). ◄ 


Finally, we mention the special case of the integral representation (16.83) for 
n = 0 , 

1 f n 1 f 2n 

Jo(z) = — / cos(zsind)dd = — / cos(z sinO)dd, 
n Jo 271 J 0 

since cos(z sin 6) repeats itself in the range 9 = n to 9 = 2n. However, sin(z sin 9) 
changes sign in this range and so 

1 f 2n 

— / sin(zsind)d0 = 0. 

- n Jo 
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Using de Moivre’s theorem, we can therefore write 

1 f 2n 1 f 2n 

Jo(z) = x— / exp(/z sin 0) clQ = „ / exp(/zcos0)rf0. 

Jo 271 Jo 

There are in fact many other integral representations of Bessel functions, which 
can be derived from those given. 


16.8 General remarks 

As was our intention, in respect of infinite series solutions we have concentrated 
to a very marked degree on Bessel’s equation and, in respect of finite polynomial 
solutions, on Legendre’s equation. The techniques used are, however, applicable 
to many equations other than these, but since the procedures are in all essentials 
the same, we do not need to treat them explicitly. The solutions of the remaining 
equations in table 16.1 are discussed briefly in the next chapter in connection 
with Sturm-Liouville systems. 


16.9 Exercises 

16.1 Find two power series solutions about z = 0 of the differential equation 

(1 — z 2 )y" — 3 zy' + Xy = 0. 

Deduce that the value of X for which the corresponding power series becomes an 
Nth-degree polynomial (Jjv(z) is N(N + 2). Construct Lb(z) and Ui(z). 

16.2 Find solutions, as power series in z, of the equation 

4zy" + 2( 1 — z )y' — y = 0. 

Identify one of the solutions and verify it by direct substitution. 

16.3 Find power series solutions in z of the differential equation 

zy" — 2 y 1 + 9 z 5 y = 0. 

Identify closed forms for the two series, calculate their Wronskian, and verify 
that they are linearly independent. Compare the Wronskian with that calculated 
from the differential equation. 

16.4 Change the independent variable in the equation 

d 2 f df 

di +2{z - a) i +4f =° <*> 

from z to x = z — a, and find two independent series solutions, expanded about 
x = 0, of the resulting equation. Deduce that the general solution of (*) is 

f(z, a) = A(z - a ) e ~< z -“> 2 + B £ (z - a) 2m , 

m = 0 ' '' 

with A and B arbitrary constants. 

16.5 (a) Verify that z = 1 is a regular singular point of Legendre’s equation and that 

the indicial equation for a series solution in powers of (z — 1) has roots 0 
and 3. 

(b) Obtain the corresponding recurrence relation and show that (7 = 0 does not 
give a valid series solution. 
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16.6 


16.7 


16.8 


16.9 


(c) Determine the radius of convergence R of the a = 3 series and relate it to 
the positions of the singularities of Legendre’s equation. 

Verify that z = 0 is a regular singular point of the equation 

z 2 y" — \zy' + (1 + z)y = 0, 

and that the indicial equation has roots 2 and 1 /2. Show that the general solution 
is 


y(z) 


6aoz 2 

n = 0 


(— l)"(n+ l)2 2 "z" 
(2n + 3)! 


+ b 0 


^z 1 / 2 + 2z 3 / 2 


1/2 -A (-1)”2 2n z n \ 
4~ ^ n(n — 1 1(2/7 — 3) ! J ' 


Use the derivative method to obtain as a second solution of Bessel’s equation for 
the case when v = 0 the following expression: 


00 

J 0 (z)\nz-^ 

n= 1 


(-D n 

( n \) 2 



given that the first solution is Jo(^) as specified by (16.63). 

By initially writing y(x) as x 1/2 /(x) and then making subsequent changes of 
variable, reduce 


d 2 y 

dx 2 


+ Axy = 0 


to Bessel’s equation. Hence show that a solution that is finite at x = 0 is a 
multiple of x 1/2 J 1 / 3 (j y /Ix 2 ). 

(a) Show that the indicial equation for 

zy" — 2 y + yz = 0 

has roots that differ by an integer but that the two roots nevertheless generate 
linearly independent solutions 

, , , A (-D" +1 2nz 2 "+ l 

y 1 (z) = 3a oL (2n + !)! ’ 


00 

y 2 (z) = 

n = 0 


( — 1)" +1 (2h — l)z 2 " 
(2n)! 


16.10 


(b) Show that yi(z) is equal to 3a 0 (sinz — zcosz) by expanding the sinusoidal 
functions. Then, using the Wronskian method, find an expression for y 2 (z) 
in terms of sinusoids. (You will need to write z 2 as (z/sinz)(zsinz) and 
integrate by parts to evaluate the integral involved.) 

(c) Confirm that the two solutions are linearly independent by showing that 
their Wronskian is equal to — z 2 , in accordance with (16.4). 

Find series solutions of the equation y" — 2 zy' — 2 y = 0. Identify one of the series 
as .vi(z) = expz 2 and verify this by direct substitution. By setting y 2 (z.) = u(z)yi(z ) 
and solving the resulting equation for u(z), find an explicit form for y 2 (z) and 
deduce that 



n\ 

2(2/7+ 1)! 


(2x) 2n+1 . 
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16.11 


16.12 


16.13 


16.14 


16.15 


16.16 


(a) Identify and classify the singular points of the equation 

i(l -*)*+(!-*)£ +1,-0, 

dz z dz 

and determine their indices. 

(b) Find one series solution in powers of z. Give a formal expression for a 
second linearly independent solution. 

(c) Deduce the values of X for which there is a polynomial solution fV(z) of 
degree N. Evaluate the first four polynomials, normalised in such a way that 
Pn( 0) =1. 

Find the general power series solution about z = 0 of the equation 


d 2 y 


dy 


z-^+(2z-3)/ + -y = 0. 
dz z dz z 


Find the radius of convergence of a series solution about the origin for the 
equation (z 2 + az + b)y" + 2y = 0 in the following cases: 


(a) a = 5, b = 6; (b) a = 5, b = 1. 

Show that if a and b are real and 4 b > a 2 then the radius of convergence is 
always given by b 1/2 . 

For the equation y" + z~ 3 y = 0, show that the origin becomes a regular singular 
point if the independent variable is changed from z to x = 1/z. Flence find a 
series solution of the form .vi(z) = a„z^ n . By setting y 2 (z) = u(z)yi(z) and 
expanding the resulting expression for du/dz in powers of z -1 , show that y 2 (U 
has the asymptotic form 


y 2 {z) = c 


z + lnz — i + O ( 

z 


where c is an arbitrary constant. 
Prove that the Laguerre equation 


z 


d 2 y 

dz 2 


+ ( 1 - 2 ) 


dy 

dz 


+ Xy = 0 


has polynomial solutions L N (z) if X is a non-negative integer N, and determine 
the recurrence relationship for the polynomial coefficients. Flence show that an 
expression for L N (z), normalised in such a way that L N ( 0) = Nl, is 


N 

L N (z) = Y, 

n=0 


(-1 Y'(N\) 2 
(N — n)\(n\) 2 


Evaluate L 2 (z) explicitly. [The Laguerre generating function is discussed in exer- 
cise 17.9.] 

(a) Use Leibniz’ theorem to show that the Rodrigues’ formula for the Laguerre 
polynomials L N (z) of the previous question is 


d N 

Lx(z) = c- — (z^-). 


(b) Use the Rodrigue formulation to prove that 

zL' n (z) = L n+1 (z ) -(N + 1 - z)L n (z). 


(c) Deduce the recurrence relation for the Laguerre polynomials, namely 


Ljv+i(z) + (z - 2iV - I)Ljv(z) + N 2 Lm~i(z) = 0. 
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16.17 


16.18 


16.19 


16.20 


16.21 


16.22 


Equation (16.32) was shown to have a polynomial solution provided that X = 2n 
with n an integer > 0. The polynomials are known as Hermite polynomials H n (x) 
and are of importance in the quantum mechanical treatment of the harmonic 
oscillator problem. They may also be defined by 

00 . 

®(.x, h ) = exp(2 xh — h (i) 2 ) = 'S~' — H n (x)h". 

n\ 

n = 0 


Show that 


a 2 ® a® a® „ 

~ 2x lh +2h ~eh _0, 


and hence that the H„(x) satisfy (16.32). Use ® to prove that 


(a) H’Jx) = 2nH n _i(x), 

(b) H n+ i(x) - 2 xH n (x) + 2 nH n _i(x) = 0. 

By writing ®(x, h) of the previous exercise as a function of h — x rather than of 
/j, show that an alternative representation of the nth Hermite polynomial is 

H„(x) = (-1)" (exp x 2 ) -^[exp(— x 2 )]. 

(Note that H n (x) = d”$>/dh" at h = 0.) 

Obtain the recurrence relations for the solution of Legendre's equation (16.35) 
in inverse powers of z, i.e. set y(z) = with a 0 ^ 0. Deduce that if £ is 

an integer then the series with a = £ will terminate and hence converge for all z 
whilst that with a = —(£ + 1) does not terminate and hence converges only for 
|z| > 1. 

Carry through the following procedure as an alternative proof of result (16.45). 


(a) Square both sides of (16.49), giving the generating-function definition of the 
Legendre polynomials. 

(b) Express the RHS as a sum of powers of h, obtaining expressions for the 
coefficients. 

(c) Integrate the RHS from —1 to 1 and use the orthogonality results (16.46). 

(d) Similarly integrate the LHS and expand the result in powers of h. 

(e) Compare coefficients. 

A charge +2 q is situated at the origin and charges of — q are situated at distances 
+a from it along the polar axis. By relating it to the generating function for the 
Legendre polynomials, show that the electrostatic potential ® at a point (r, 9, <f>) 
with r > a is given by 


®(r, 0, (j>) 


2q 

4neor 


00 1 

E( 2 )A<cos 9| . 


The origin is an ordinary point of the Chebyshev equation, 

( 1 — z 2 )y" — zy' + m 2 y = 0, 

which therefore has series solutions of the form z° a n z " for a = 0 and a = 1. 

(a) Find the recurrence relationships for the a„ in the two cases and show that 
there exist polynomial solutions T„,(z): 


(i) for (7 = 0, when m is an even integer, the polynomial having |(m + 2) 
terms; 

(ii) for (7 = 1, when m is an odd integer, the polynomial having i(m + 1) 
terms. 
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(b) T,„(z) is normalised so as to have T m ( 1) = 1. Find explicit forms for T,„(z) 
for m = 0, 1,2,3. 

(c) Show that the corresponding non-terminating series solutions S m (z) have as 
their first few terms 

S 0 (z) = a 0 ^z + -^yz 3 + |yz 5 + • • • ^ , 

Sl ( z) = a o(l-^-lz 4 -...), 

S 2 (z) = a 0 ^z - ~y z 3 - -j^z 5 ^ , 

S 3 (z) = a 0 ^1 - ^z 2 + ^z 4 4 ^ . 


16.23 


16.24 


By choosing a suitable form for h in (16.82), show that further integral repe- 
sentations of the Bessel functions of the first kind are given, for integral m, 
by 

/ ji \m n2n 

= / cos(z cos 6) cos2m0 dd m > 1, 

n Jo 

/ i \m+l r2n 

J 2m+1 (z) = / cos(zcosd) sin(2m 4- 1)6 dd m > 0. 

n Jo 

Show from the definition given in (16.66) that the Bessel function of the second 
kind of order v can be written as 


Fv(z) = - 

71 


SJjJz) 
L Sfx 


-i-i) 


v d J-n(z) 

d/r 


Using the explicit expression (16.63) for J tl (z), show that 8J tl (z)/8fi can be written 
as 

d v (z)ln (=y) 4-g(v,z), 
and deduce that Y v (z) can be expressed as 

Y v (z)= ^J v (z) hi 4 ~h{v,z), 

h(v,z ), like g(v,z), being a power series in z. 


16.10 Hints and answers 

16.1 Note that z = 0 is an ordinary point of the equation. 

For a = 0,a n+2 /a n = [n(n + 2) — X]/[(n + l)(n + 2)] and correspondingly for a = 1; 
Uiiz) = ao(l — 4z 2 ) and U 3 (z) = ao(z — 2 z 3 ). 

16.2 a 0 exp(z/2); do^ 1/2 1]" =0 (2z)"«!/(2fi 4- 1)!. 

16.3 a = 0 and 3; = (— l) m /(2m)! and a 6m /ao = ( — l) m /(2 m + 1)! respectively. 

yi(z) = fl 0 cosz 3 and y 2 (s) = aosinz 3 . The Wronskian is +3 a^z 2 0. 

16.4 x = 0 is an ordinary point of the transformed equation and so a = 0 and 1. 

For a = l,a „ + 2 = —2a„/(n + 2) and so a 2m /a 0 = (— 1 For a = 0,a„ +2 = 

-2 a„/(n 4- 1) and so a 2m /a 0 = (-2)"'/ n”Li( 2r — !)• 

16.5 (b) a„ + i/a„ = —[((7 4- n)(a + n — 3) 4- 4- 1 )]/[2(cr 4- n) 2 — 2\. For a = 0, a 2 = oo. 

(c) R = 2, equal to the distance between z = 1 and the closest singularity at 
z = —1. 
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16.8 

16.9 

16.10 
16.11 


16.12 


16.13 

16.14 

16.15 

16.16 


16.17 

16.19 


16.20 


16.21 

16.22 

16.23 

16.24 


x 2 f" + xf + (2.x 3 — i)/ = 0. Then, in turn, set x 3/2 = u, and 2)J /2 u/3 = v; then v 
satisfies Bessel’s equation with v = 1/3. 

(b) cosz +zsinz. 

y 2 (z) = (expz 2 ) fg exp (—x 2 )dx. 

(a) Regular singular points at z = 0 (indices 0, 0) and at z = 1 (indices 0, 1). 

(b) yi(z) = a 0 +a 0 Er=i(" ! r 2 z" IXUV 2 “ E 

y 2 (z) = yi(z)lnz + E”=t z" {(S/da) { n“=o [(r + a) 2 - l]/(n + a + l) 2 })^. 

(c) 2 = IV 2 ; polynomials are 1, 1 — z, (1 — z)(l — 3z), (1 — z )( 1 — 8z + 10z 2 ). 
Repeated roots a = 2. 


where 


00 

y(z) = az 2 +J2 

n= 1 


(n + 1)(— 2 z) n+2 
n\ 


+b[lnz + g(n)]| , 


g(n) 


1 

n + 1 


1 1 

n n — 1 


1 

2 


- 2 . 


(a) 2; (b) V7. 

Transformed equation is xy" + 2y' +y = 0; a„ = (— l)"(fi + l) 1 (n !) 2 ao; du/dz = 
A[\’i(z )] ~ 2 . 

fl„ + i = —(N — n)a„/(n + l) 2 ; L 2 {z) = 6 — 18z + 9z 2 — z 3 . 

(b) Calculate L N+I (z), considering z N+1 e~ : as z z N e~ z . Later write d N /dz N (z N e~ z ) 
as e~ z L N (z). (c) Use (b) to calculate L' N+l (z), substituting for L'^(z) from the 
Laguerre equation. Substitute from (b) for the first derivatives, and finally change 
n + 1 to n. 

Consider d<S>/dx; (b) differentiate result (a) and then use (a) again to replace the 
derivatives. 

a = /; a, t+ 2 = [(/’ — n))f — n— l)a„]/[(n + 2){n — 22 +1)]. Note that (n — 22 + 1) =f= 0 
for n < { + 1 and n even. 

a = — (/ + 1); a n + 2 = [(/ + n + 1 )(2 + n + 2)a„]/[(;i + 2)(n + 22 + 3)]. 

At step (d) 


1 

h 


In 


1 + h 
1 — h 


E /;2 " 

h = 0 


P 2 (x)dx. 


Using the cosine law, the distances from the charges —q are of the form 

r [1 + 2(a/r) cos 9 + (a/ r ) 2 ] 'E 

(a) (i) a n + 2 = [a„(n 2 — m 2 )]/[(n + 2)(n + 1)], 

(ii) a n+2 = {a„[(n + l) 2 — m 2 ]}/[(n + 3 )(n + 2)]; (b) 1, z, 2z 2 — 1, 4z 3 — 3z. 
Set h = i exp id and obtain an expression for cos(z cos 9). 

Recall that J_ v (z) = (— l) v J v (z) for integer v. 
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17 


Eigenfunction methods for 
differential equations 


In the previous three chapters we dealt with the solution of differential equations 
of order n by two methods. In one method, we found n independent solutions 
of the equation and then combined them, weighted with coefficients determined 
by the boundary conditions; in the other we found solutions in terms of series 
whose coefficients were related by (in general) an n-term recurrence relation and 
thence fixed by the boundary conditions. For both approaches the linearity of the 
equation was an important or essential factor in the utility of the method, and 
in this chapter our aim will be to exploit the superposition properties of linear 
differential equations even further. 

We will be concerned with the solution of equations of the inhomogeneous 
form 


Cy(x) = f(x), (17.1) 

where f(x) is a prescribed or general function and the boundary conditions to 
be satisfied by the solution y = y(x), for example at the limits x = a and x = b, 
are given. The expression Cy(x) stands for a linear differential operator C acting 
upon the function y(x). 

In general, unless /(x) is both known and simple, it will not be possible to find 
particular integrals of (17.1), even if complementary functions can be found that 
satisfy Cy = 0. The idea is therefore to exploit the linearity of C by building up 
the required solution as a superposition, generally containing an infinite number 
of terms, of some set of functions that each individually satisfy the boundary 
conditions. Clearly this brings in a quite considerable complication but since, 
within reason, we may select the set of functions to suit ourselves, we can obtain 
sizeable compensation for this complication. Indeed, if the set chosen is one 
containing functions that, when acted upon by C, produce particularly simple 
results then we can ‘show a profit’ on the operation. In particular, if the set 


581 



EIGENFUNCTION METHODS FOR DIFFERENTIAL EQUATIONS 


consists of those functions y t for which 

Cyfx) = Ayfx), (17.2) 

where A; is a constant, then a distinct advantage may be obtained from the 
manoeuvre because all the differentiation will have disappeared from (17.1). 

Equation (17.2) is clearly reminiscent of the equation satisfied by the eigenvec- 
tors x' of a linear operator A , namely 

Ax' = A,-x', (17.3) 

where A; is a constant and is called the eigenvalue associated with x‘. By analogy, 
in the context of differential equations a function y,(x) satisfying (17.2) is called 
an eigenfunction of the operator C and A,- is then called the eigenvalue associated 
with the eigenfunction yfx). 

Probably the most familiar equation of the form (17.2) is that which describes 
a simple harmonic oscillator, i.e. 

Cy = = co 2 y, where C = — df/dt 2 . (17.4) 

at- 

lit this case the eigenfunctions are given by y n (t) = A n e KOnt , where a>„ = 2nn/T, 
T is the period of oscillation, n = 0, +l,+2, ... and the A„ are constants. The 
eigenvalues are ml = n 2 m\ = n 2 (2n/T) 2 . (Sometimes m n is referred to as the 
eigenvalue of this equation but we will avoid this confusing terminology here.) 
Another equation of the form (17.2) is Legendre’s equation 

Cy = -(1 - x 2 )0 + 2 = A(A + l)_v, (17.5) 

where 

id 2 d 

C = —(1 — x 2 )—j + 2x — . (17.6) 

dx z dx 

We found the eigenfunctions of £ by a series method in chapter 16, and for 
solutions to Legendre’s equation that are regular at x = +1 these are the 
Legendre polynomials, given by 

y,(x) = P,(x) = ^J^(x 2 - 1/ (17.7) 

for A = 0, 1, 2, ... ; they have associated eigenvalues A(A+1). (Again, A is sometimes, 
confusingly, referred to as the eigenvalue of this equation.) 

We may discuss a somewhat wider class of differential equations by considering 
a slightly more general form of (17.2), namely 

Cy{x) = Xp(x)y(x), (17.8) 

where p(x) is a weight function. In many applications p(x) is unity for all x, in 
which case (17.2) is recovered; in general, though, it is a function determined by 
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the choice of coordinate system used in describing a particular physical situation. 
The only requirement on p(x) is that it is real and does not change sign in the 
range a < x < b, so that it can, without loss of generality, be taken to be non- 
negative throughout. A function y(x) that satisfies (17.8) is called an eigenfunction 
of the operator C with respect to the weight function p{x). 

This chapter will not cover methods used to determine the eigenfunctions of 
(17.2) or (17.8), since we have discussed these in previous chapters, but, rather, 
will use the properties of the eigenfunctions to solve inhomogeneous equations 
of the form (17.1). We shall see later that the sets of eigenfunctions y,(x) of 
a particular class of operators called Hermitian operators (the operators in the 
simple harmonic oscillator equation and in Legendre’s equation are examples) 
have particularly useful properties and these will be studied in detail. I turns 
out that many of the interesting operators met with in the physical sciences are 
Hermitian. Before continuing our discussion of the eigenfunctions of Hermitian 
operators, however, we will consider the properties of general sets of functions. 


17.1 Sets of functions 

In chapter 8 we discussed the definition of a vector space but concentrated on 
spaces of finite dimensionality. We consider now the infinite- dimensional space 
of all reasonably well-behaved functions f(x ), g(x), h(x), ... on the interval 
a < x < b. That these functions form a linear vector space can be verified since 
the set is closed under 

(i) addition, which is commutative and associative, i.e. 

fix) + g(x) = g(x) + f{x), 

U(x) + g(x)] + h(x) = fix) + [g(x) + h(x )] , 

(ii) multiplication by a scalar, which is distributive and associative, i.e. 

[fix) + g(x)] = Xfix) + Agix), 

* [bf(x)i = Up) fix), 

(1 + p)f(x) = Af(x) + pfix). 


Furthermore, in such a space 

(iii) there exists a ‘null vector 0 such that fix) + 0 = fix), 

(iv) multiplication by unity leaves any function unchanged, i.e. lx/(x) = /(4 

(v) each function has an associated negative function —fix) that is such that 
fix) + [-/(*)] = 0. 

By analogy with finite-dimensional vector spaces we now introduce a set 
of linearly independent basis functions y„(x), n = 0 , 1 ,..., oo, such that any 
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‘reasonable’ function in the interval a < x <b (i.e. it obeys the Dirichlet conditions 
discussed in chapter 12) can be expressed as the linear sum of these functions: 

00 

f(x) = 

n = 0 

Clearly if a different set of linearly independent basis functions z n (x) is chosen 
then the function can be expressed in terms of the new basis, 

00 

f(x) = d » z "( x )’ 

n = 0 


where the d„ are a different set of coefficients. In each case, provided the basis 
functions are linearly independent, the coefficients are unique. 

We may also define an inner product on our function space by 

(f\g)=[ f(x)g(x)p(x)dx, (17.9) 


where p(x) is the weight function, which we require to be real and non-negative 
in the interval a < x < b. As mentioned above, p(x) is often unity for all x. Two 
functions are said to be orthogonal on the interval [a,b] if 

(/|g) = [ f(x)g(x)p(x)dx = 0, (17.10) 


and the norm of a function is defined as 


ll/ll = </l/) 1/2 = 


f(x)f(x)p(x)dx 


1/2 



1/2 


\f{x)\ 2 p(x)dx 


(17.11) 


An infinite-dimensional vector space of functions, for which an inner product 
is defined, is called a Hilbert space. Using the concept of the inner product we 
can choose a basis of linearly independent functions 4> n (x), n = 0,1,2,..., that 
are orthonormal, i.e. such that 

(</>#;)=/ (t>1{x)(j)j{x)p(x)dx = dij. (17.12) 

If y„(x), n = 0,1,2,..., are a linearly independent, but not orthonormal, basis 
for the Hilbert space then an orthonormal set of basis functions may be 
produced (in a similar manner to that used in the construction of a set of 
orthogonal eigenvectors of an Hermitian matrix, see chapter 8) by the following 
procedure, in which each of the new functions xp n is to be normalised, giving 
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(p n = Wn{Wn\Wn) ^ 2 , before proceeding to the construction of the next one: 

Wo = yo, 

Wi = vi -^o^olyi), 

W2 = yi — ( 0 i 1^2) - ^oi^olyi), 


Wn y> 1 (frn—li&n—llyn) ' ' ' tpo{tpo\ Th) 


It is straightforward to check that each fi n = WniWnlWn )~ 1//2 is orthogonal to 
all its predecessors (pi, i = 0,1,2 — 1. This method is called Gram-Schmidt 
orthogonalisation. Clearly the functions xp„ also form an orthogonal set, but in 
general they do not have unit norms. 


► Starting from the linearly independent functions y n (x) = x", n = 0,1,..., construct the 
first three orthonormal functions over the range — 1 < x < 1. 


The first unnormalised function yt 0 is simply equal to the first of the original functions, i.e. 

xp 0 = 1. 

The normalisation is carried out by dividing by 

1/2 


(wo\Wo) 1/2 = (/ 1 x 1 dii'j = V2, 


with the result that the first normalised function cpo is given by 

, _ Wo 

^ V2' 

The second unnormalised function is found by applying the above Gram-Schmidt orthog- 
onalisation procedure, i.e. 

Vi = yi~ <Po(<Po\yi)- 

It can easily be shown that (4>o\yi) = 0, and so ipi = x. Normalising then gives 


<Pi = Wi 


u x u du 


- 1/2 


The third unnormalised function is similarly given by 

W2 = yi - faifidyi) - foiMyi) 

= x 2 -0-l, 

which, on normalising, gives 


b 2 = W 2 


f'w-if 


du 


- 1/2 


3x 2 — 1). 


By comparing the functions cpo, <pi and <j> 2 , with the list in subsection 16.6.1, we see that 
this procedure has generated (multiples of) the first three Fegendre polynomials. ◄ 
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If a function is expressed in terms of an orthonormal basis cj) n {x) as 

00 

= (17.13) 

n = 0 


then the coefficients a„ are given by 


a n = (<t>n\f) = f <j)* n {x)f(x)p(x)dx. 


(17.14) 


Note that this is true only if the basis is orthonormal. 


17.1.1 Some useful inequalities 

Since for a Hilbert space (/|/) > 0, the inequalities discussed in subsection 8.1.3 
hold. The proofs are not repeated here, but the relationships are listed for 
completeness. 

(i) The Schwarz inequality states that 

l(/lg)l < </l/> 1/2 (glg} 1/2 , (17.15) 

where the equality holds when f(x) is a scalar multiple of g(x), i.e. when 
they are linearly dependent. 

(ii) The triangle inequality states that 

11/ + gll < ll/ll + Ilg||, (17.16) 

where again equality holds when f(x) is a scalar multiple of g(x). 

(iii) Bessel’s inequality requires the introduction of an orthonormal basis ( p n (x ) 
so that any function f(x) can be written as 

00 

fix) = ^ ^ c «0«(x ), 

n = 0 

where c n = {(f>„\f). Bessel’s inequality then states that 

(/|/) >5>„| 2 . (17.17) 

n 

The equality holds if the summation is over all the basis functions. If some 
values of n are omitted from the sum then the inequality results (unless, 
of course, the c„ happen to be zero for all values of n omitted, in which 
case the equality remains). 
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17.2 Adjoint and Hermitian operators 


Having discussed general sets of functions we now return to the discussion of 
eigenfunctions of linear operators. The adjoint of an operator £, denoted by £\ 
is defined by 



[£g(x)] P(x) dx = 


g*(.x) [tffix)] p(x)dx 


(17.18) 


or, in inner product notation, (f\Cg) = (g|£ 1 '/)*. An operator is then said to be 
self-adjoint or Hermitian if £' = £, i.e. if 


f(x) [£g{xj\ p{x) dx = / g*(x) [£f (x)] p(x) dx 


(17.19) 


or, in inner product notation, (f\£g) = (g|£/)*. From (17.19) we note that, when 
applied to an Hermitian operator, the general property (b\a)* = (a\b) takes the 
form 

< g i£/y = (cf\ g ) => mg) = (f\c g ) = (m g ). 


where the notation of the final equality emphasises that £ can act on either / or g 
without changing the value of the inner product. A little careful study will reveal 
the similarity between the definition of an Hermitian operator and the definition 
of an Hermitian matrix given in chapter 8. In general, however, an operator £ 
is Hermitian over an interval a < x < b only if certain boundary conditions are 
met by the functions / and g on which it acts. 


► Find the required boundary conditions for the linear operator £ = d 2 /dt 2 to be Hermitian 
over the interval to to to + T. 

Substituting into the LHS of the definition of an Hermitian operator (17.19) and integrating 
by parts gives 


r' 0+T d 2 e 

L 


f.dg 
1 dt 


! f0+T r‘ o+T df_ dg 

J tQ dt dt 


dt, 


“ *u <- ->10 u 

where we have taken the weight function p(x) to be unity. Integrating the second term on 
the RHS by parts yields 


/ 


,0+T r d 2 g , 
f l£ dt = 


',dg 
1 dt 


to+T 


d_r_ 

dt 


to+T rto+T d 2r, 

+ L 


J to L -I to L J(q J l 0 

Remembering that the operator is real and taking the complex conjugate outside the 
integral gives 


/ 


'to+T d 2 e 

f W dt = 


f.dg 

1 dt 


to+T 
J to 


'dr 

dt 


to+T 


,f0+r , d 2 f , 


Jto tu L ‘“Jf 0 L Jf 0 \Jto "t 

which, by comparison with (17.19), proves that £ is Hermitian provided 


dg 

to+T 

\ df 1 

dt_ 

to 

dt 8 


to+T 
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We showed in chapter 8 that the eigenvalues of Hermitian matrices are real and 
that their eigenvectors can be chosen to be orthogonal. Similarly, the eigenvalues 
of Hermitian operators are real and their eigenfunctions can be chosen to be 
orthogonal (we will prove these properties in the following section). Hermitian 
operators (or matrices) are often used in the formulation of quantum mechanics. 
The eigenvalues then give the possible measured values of an observable quantity 
such as energy or angular momentum, and the physical requirement that such 
quantities must be real is ensured by the reality of these eigenvalues. Furthermore, 
the infinite set of eigenfunctions of an Hermitian operator form a complete basis 
set, so that it is possible to expand in an eigenfunction series any function y(x) 
obeying the appropriate conditions: 

00 

y(x) = ^2 c „y„( x ), (17.20) 

h=0 

where the choice of suitable values for the c n will make the sum arbitrarily close 
to y(x). | These useful properties provide the motivation for a detailed study of 
Hermitian operators. 


17.3 The properties of Hermitian operators 

We now provide proofs of some of the useful properties of Hermitian operators. 
Again much of the analysis is similar to that for Hermitian matrices in chapter 8, 
although the present section stands alone. (Here, and throughout the remainder 
of this chapter, we will write out inner products in full. We note, however, 
that the inner product notation often provides a neat form in which to express 
results.) 


17.3.1 Reality of the eigenvalues 

Consider an Hermitian operator for which (17.8) is satisfied by at least two 
eigenfunctions yfx) and yfx), which have eigenvalues A,- and Ay respectively, so 
that 


£>’,■ = A,p(x)>’, : , (17.21) 

Cyj = ljp(x)yj, (17.22) 

where p(x) is the weight function. Multiplying (17.21) by y * and (17.22) by y* 


( The proof of Ihe completeness of the eigenfunctions of an Hermitian operator is beyond the scope 
of this book. The reader should refer to e.g. Courant and Hilbert, Methods of Mathematical Physics 
(Interscience Publishers, 1953). 
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and then integrating gives 



y*£y,- dx = A; 
y*Cyj dx = Ay 


y]yipdx. 


yJyjp dx. 


(17.23) 

(17.24) 


Remembering that we have required p{x) to be real, the complex conjugate of 
(17.23) becomes 



y*£y,- dx 


= a / y’yjpdx , 


(17.25) 


and using the definition of an Hermitian operator (17.19) it follows that the LHS 
of (17.25) is equal to the LHS of (17.24). Thus 

(A* - Ay) [ y*Vjpdx = 0. (17.26) 

J a 

If i = j then A,- = A* (since j^y*y t pdx f 0), which is a statement that the 
eigenvalue A, is real. 


17.3.2 Orthogonality of the eigenfunctions 

From (17.26), it is immediately apparent that two eigenfunctions y,- and yy that 
correspond to different eigenvalues, i.e. such that A,- f Ay, satisfy 

y'yjpdx = 0, (17.27) 

which is a statement of the orthogonality of y,- and yj. Because C is linear, the 
normalisation of the eigenfunctions y,(.x) is arbitrary and we shall assume for 
definiteness that they are normalised so that f / ’ vlvipdx = 1. Thus we can write 
(17.27) in the form ‘ 

yjyjpdx = dij, (17.28) 

which is valid for all pairs of values j. 

If one (or more) of the eigenvalues is degenerate, however, we have different 
eigenfunctions corresponding to the same eigenvalue, and the proof of orthogo- 
nality is not so straightforward. Nevertheless, an orthogonal set of eigenfunctions 
may be constructed using the Gram- Schmidt ortho gonalisation method mentioned 
earlier in this chapter and used in chapter 8 to construct a set of orthogonal 
eigenvectors of an Hermitian matrix. We repeat the analysis here for complete- 
ness. 
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Suppose, for the sake of our proof, that Ao is fc-fold degenerate, i.e. 

Cyi = Xopyi for i = 0,l,...,k— 1, (17.29) 

but that Ao is different from any of Afc, Ar+i, etc. Then any linear combination of 
these y,' is also an eigenfunction with eigenvalue Ao since 

k - 1 k - 1 k - 1 

Cz = CiTi = X] c '£T; = X! c ^oPT, = A 0 pz. (17.30) 

i =0 i =0 i '=0 

If the y ; dehned in (17.29) are not already mutually orthogonal then consider 
the new eigenfunctions z,- constructed by the following procedure, in which each 
of the new functions w,- is to be normalised, to give z ; , before proceeding to the 
construction of the next one (the normalisation can be carried out by dividing 
the eigenfunction w,- by (J 6 w'wipdx) 1 / 2 ): 


wo = To, 


wi = Ti — 

w 2 = T2 — 


z 0 / z^yipdx 
J a 

Z\ / z\y 2 pdx 

J a 


?o / z ( *y 2 /> dx 


Wk - 1 = Tfc-i - 



/ z* k _ 2 y k -ipdx 
J a 


z 0 / ZQVk-ipdx 


Each of the integrals is just a number and thus each new function z,- = 
Wi(f a b w;wipdxr l/2 is, as can be shown from (17.30), an eigenvector of £ with 
eigenvalue Ao- It is straightforward to check that each z,- is orthogonal to all its 
predecessors. Thus, by this explicit construction we have shown that an orthog- 
onal set of eigenfunctions of an Hermitian operator C can be obtained. Clearly 
the orthonormal set obtained, z,-, is not unique. 


17.3.3 Construction of veal eigenfunctions 

Recall that the eigenfunction y ; satisfies 

£y, = /-ipy'i (17.31) 

and that the complex conjugate of this gives 

£t* = g py* = hpy'i, (17.32) 

where the last equality follows because the eigenvalues are real, i.e. A f = A* . Thus, 
y,- and y* are eigenfunctions corresponding to the same eigenvalue and hence, 
because of the linearity of C, at least one of y* +y,- and i(y* — y,) (which are both 
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real) is a non-zero eigenfunction corresponding to that eigenvalue. Therefore the 
eigenfunctions can always be made real by taking suitable linear combinations. 
Such linear combinations will only be necessary in cases where a particular X is 
degenerate, i.e. corresponds to more than one linearly independent eigenfunction. 


17.4 Sturm-Liouville equations 

One of the most important applications of our discussion of Hermitian operators 
is to the study of Sturm-Liouville equations, which take the general form 

p(x)^rZ + r(x)^~ + i q(x)y + Xp(x)y = 0, where r(x) = (17.33) 

ax 1 ax ax 

and p, q and r are real functions of x. (We note that sign conventions vary in this 
expression for the general Sturm-Liouville equation; some authors use — Xp(x)y 
on the LHS of (17.33).) A variational approach to the Sturm-Liouville equation, 
which is useful in estimating the eigenvalues 1 of the equation, is discussed 
in chapter 22. For now, however, we concentrate on a demonstration that the 
Sturm-Liouville equation can be solved by superposition methods. 

It is clear that (17.33) can be written 


Cy = Xp(x)y 


where C = — 




+ q(x) 


(17.34) 


An example is Legendre’s equation (17.5), which is a Sturm-Liouville equation 
with p(x) = 1 — x 2 , r(x) = —2x = p'(x), q(x) = 0, p(x) = 1 and eigenvalues 
f(t + 1 ). 

It will be seen that the general Sturm-Liouville equation (17.33) can be rewritten 


(py')' + qy + ^ py = o, 


(17.35) 


where primes denote differentiation with respect to x. Using (17.34) this may also 
be written Cy = —(py')' — qy = Xpy. We will show in the next section that, under 
certain boundary conditions on the solutions y(x), linear operators that can be 
written in this form are self-adjoint. 

Whilst it is true that Sturm-Liouville equations represent only a small fraction 
of the differential equations encountered in practice, as we shall demonstrate in 
subsection 17.4.2 any second-order differential equation of the form 

p(x)y" + r(x)y' + q(x)y + Xp(x)y = 0 (17.36) 

can be converted into Sturm-Liouville form by multiplying through by a suitable 
factor; this is discussed in subsection 17.4.2. 
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17.4.1 Valid boundary conditions 


For the linear operator of the Sturm-Liouville equation (17.34) to be Hermitian 
over the range [a, b] requires certain boundary conditions to be met, namely, that 
any two eigenfunctions y t and yj of (17.34) must satisfy 

[y*py'j\ x=a = [yiPy'j\ x=b for ail 1 , 7 . (17.37) 


Rearranging (17.37) we find that 


Ti py i 


= 0 , 


(17.38) 


is an equivalent statement of the required boundary conditions. These boundary 
conditions are in fact not too restrictive and are met, for instance, by the sets 
y(a) = y(b) = 0; y(a) = y'(b) = 0; p(a) = p(b) = 0 and by many other sets. It 
is important to note that in order to satisfy (17.37) and (17.38) one boundary 
condition must be specified at each end of the range. 


►Prore that the Sturm-Liouville operator is Hermitian over the range [a,b] and under the 
boundary conditions (17.38). 


Putting the Sturm-Liouville form Cy = — ( py ')' — qy into the definition (17.19) of an 
Hermitian operator, the LHS may be written as a sum of two terms, i.e. 

pb pb pb 

-/ [yi(pyj)' + yiqyj]dx = - yi(py'j)'dx - y-qyjdx. 

J a J a J a 

The first term may be integrated by parts to give 

■j ^ rb 

- y’py'j + / ( y')'py'jdx . 

.a a 

The first term is zero because of the boundary conditions, and thus, integrating by parts 
again yields 


(yi)'pyj 


The first term is once again zero. Thus 


((.V,’ )'p)'yj dx. 


pb pb 

- / [yiipy'jY + yiwj] dx = / [-((y’)'p)'yj-y’qyj\ dx , 

J a J a 

= { L ^ y ^ py 'j + y 'i qy ^ dx ] ’ 

which proves that the Sturm-Liouville operator is Hermitian over the prescribed interval. ◄ 


17.4.2 Putting an equation into Sturm-Liouville form 

The Sturm-Liouville equation (17.33) requires that r(x) = p'{x). However, any 
equation of the form 

p(x)y" + r(x)y' + q(x)y + lp(x)y = 0, (17.39) 


592 




17.5 EXAMPLES OF STURM-LIOUVILLE EQUATIONS 


can be put into self-adjoint form by multiplying through by the integrating factor 

x r(z)-p'(z) 


F(x) = exp 


P(z) 


■ dz 


It is easily verified that (17.39) then takes the Sturm-Liouville form 
[F(x)p(x)y']' + F(x)q(x)y + X. F(x)p(x)y = 0, 
with a different, but still non-negative, weight function F(x)p(x). 


(17.40) 


(17.41) 


► Pat the Hermite equation 


y" — 2xy’ + 2 ocy = 0 


into Sturm-Liouville form. 

Using (17.40), with p(z) = 1, p'(z) = 0 and r(z) = —2z gives the integrating factor 


FM-exp (/-»*) -«P (-*=)■ 


Thus, the Hermite equation becomes 

e~ x2 y" - 2xe~ xl y' + 2 ae^y = {e~ x2 y'Y + 2 cce^y = 0, 

which is clearly in Sturm-Liouville form with p(x ) = e~* 2 , q(x) = 0, p(x) = e~'- and 
X = 2a. ◄ 


17.5 Examples of Sturm-Liouville equations 

In order to illustrate the wide applicability of Sturm-Liouville theory, in this 
section we present a short catalogue of some common equations of Sturm- 
Liouville form. Many of them have already been discussed in chapter 16. In 
particular the reader should note the orthogonality properties of the various 
solutions, which, in each case, follow because the differential operator is self- 
adjoint. For completeness we also quote the associated generating functions. 


17.5.1 Legendre’s equation 

We have already met Legendre's equation , 

(1 - x 2 )y" - 2 xy' + + l)y = [(1 - x 2 )y']' + + l)y = 0 (17.42) 

and shown that it is a Sturm-Liouville equation with p(x) = 1 — x 2 , q(x) — 0, 
p(x) = 1 and eigenvalues f{{ + 1). In the previous chapter we found the solutions 
of Legendre’s equation that are regular for all finite x. These are the Legendre 
polynomials P/(x), which are given by a Rodrigues’ formula: 

P ^ = ^ ix2 -^ 
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The orthogonality and normalisation of the functions in the interval —1 < x < 1 
is expressed by 

J P/;(x)P k (x) dx = 2/ - 2 + 

The generating function is 

00 

G(x, h) = (1 — 2 xh + h 2 ) -1 / 2 = P n (x)h n . 

n = 0 

Legendre’s equations appear in the analysis of physical situations involving the 
operator V 2 and axial symmetry, since the linear differential operator involved has 
the form of the polar-angle part of V 2 , when the latter is expressed in spherical 
polar coordinates. Examples include the solution of Laplace’s equation in axially 
symmetric situations and the solution of the Schrodinger equation for a quantum 
mechanical system involving a central potential. 


17.5.2 The associated Legendre equation 

Very closely related to the Legendre equation is the associated Legendre equation 

2 

[(1 -x 2 )/]' + '('+l)-jz ^2 - v = 0 ’ ( 17 - 43 > 

which reduces to Legendre’s equation when m = 0. In physical applications 
< m < i and m is restricted to integer values. If y(x) is a solution of 
Legendre’s equation then 

w(x) = (l-x 2 )H/ 2 ^ 

is a solution of the associated equation. The solutions of the associated Legendre 
equation that are regular for all finite x are called the associated Legendre functions 
and are therefore given by 


Pf\x) = (l-x 2 ) w/2 


A m \Pg 
dx l m l ' 


Note also that Pf(x) = 0 for m > t. Like the Legendre polynomials, the associated 
Legendre functions P'"(x) are orthogonal in the range —1 < x < 1. This property, 
and their normalisation, is expressed by 



P?(x)Pf(x)dx = 


2 (/ + m)\ 

2/+1 (S-mV. 


They have the generating function 


G(x,h) = 


(2m)!(l -x 2 ) m/2 
"m!( 1 — 2 hx + h 2 ) m+1 / 2 


-J2 f ; 
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The associated Legendre equation arises in physical situations in which there 
is a dependence on azimuthal angle </> of the form or cos m<j). 


17.5.3 Bessel’s equation 

Physical situations that when described in spherical polar coordinates give rise 
to Legendre and associated Legendre equations lead to Bessel’s equation when 
cylindrical polar coordinates are used. Bessel’s equation has the form 

x 2 y" + xy’ + (x 2 - n 2 )y = 0, ( 1 7.44) 


but on dividing by x and changing variables to £ = x/a , f it takes on the 
Sturm-Liouville form 

(ty'Y + fty + ^ry = 0, (17.45) 

where a prime now indicates differentiation with respect to 

We met Bessel’s equation in chapter 16, where we saw that those of its solutions 
that are regular for finite x are the Bessel functions, given by 


Jn{x) y ' 

r = 0 


(-l) r (lx)' , + 2 '' 
r\r(n + /• + !)’ 


(17.46) 


where T is the gamma function discussed in the Appendix. Their orthogonality 
and normalisation over the range 0 < x < oo have been discussed in detail in 
chapter 16. The generating function for the Bessel functions is 

OO 

= ( 17 - 47 ) 

n =— oo 


G(x, h) — exp 


17.5.4 The simple harmonic equation 

The most trivial of Sturm-Liouville equations is the simple harmonic motion 
equation 

y" + a) 2 y = 0, (17.48) 

which has p(x) = 1, q(x) = 0, p(x) = 1 and eigenvalue to 2 . We have already 
met the solutions of this equation in the Fourier analysis of chapter 12, and the 
properties of orthogonality and normalisation of the eigenfunctions given there 
can now be seen in the wider context of general Sturm-Liouville equations. 


f This change of scale is required to give the conventional normalisation, but is not needed for the 
transformation into Sturm-Liouville form. 
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17.5.5 Hermit e’s equation 

The Hermite equation appears in the description of the wavefunction of a 
harmonic oscillator and is given by 

y" — 2xy' + 2ccy = 0. (17.49) 

We have already seen that it can be converted to Sturm-Liouville form by 
multiplying by the integrating factor exp(— x 2 ), which yields 

e~ x 'y" - 2xe~ %1 y' + lae^y = (e~ x2 y')' + 2 auT*y = 0. (17.50) 


The solutions, the Hermite polynomials H n (x), are given by a Rodrigues’ 
formula : 


tf„(x) = (-l)V^ (V* 2 ) . (17.51) 

Their orthogonality over the range — oo < x < oo and their normalisation are 
summarised by 


/ e x H m (x)H n (x) dx = 2 n n\^Jnd„ m , 

J — OO 

and their generating function is 

G(x, h) = e 2hx ~* = V Hn( f ) k n . 

Z . — y n I 


n=0 


(17.52) 


(17.53) 


17.5.6 Laguerre’s equation 

The Laguerre equation appears in the description of the wavefunction of the 
hydrogen atom and is given by 

xy" + (1 - x)y' + ny = 0. (17.54) 

It can be converted to Sturm-Liouville form by multiplying by the integrating 
factor exp(— x), which yields 

xe~ x y" + ( 1 — x)e~ x y' + ne~ x y = (xe~ x y')' + ne~ x y = 0. (17.55) 

The solutions, the Laguerre polynomials L„(x), are again given by a Rodrigues’ 
formula : 

Ln(X) = eX J^ (x" e ” X ) ' (17 ' 56) 

Their orthogonality over the range 0 < x < oo and their normalisation are 
expressed by 

e~ x L m (x)L n (x)dx — (n\) 2 5 mn , (17.57) 
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and their generating function is 


G(x, h) = 


e -xh/(l-h) 


\ ' 


L u (x) 


1 — h ' n ! 


h n . 


(17.58) 


n = 0 


The Chebyshev equation 


17.5.7 Chebyshev’s equation 


( 1 — x 2 )y" — xy' + n 2 y = 0 


(17.59) 


can be converted to an equation of Sturm-Liouville form by multiplying by the 
integrating factor (1 — x 2 ) _1//2 . Simplifying, this yields 


( 1 - x 2 ) 1/2 /| + n 2 (l - x 2 )~ l/2 y = 0. 


(17.60) 


The solutions, the Chebyshev polynomials T„(x), are once again given by a 
Rodrigues’ formula: 


J-irdd-r 2 ) 1 ^ ; /2 

n(x) (2 FI)! dx» ( 1 • 


(17.61) 


Their orthogonality over the range — 1 < x < 1 and their normalisation are given 
by 


! I 0 for m ^ n, 

j (1 — x 2 ) -1 /2 T m (x) T n (x) dx = < nil for n = m ± 0, (17.62) 

7i for n = m = 0, 


and their generating function is 


I , 00 

G ( x ’ h ) = 7 7 r -0 = T n(x) hn - 


1 — 2 xh + h 2 


(17.63) 


n = 0 


17.6 Superposition of eigenfunctions: Green’s functions 

We have already seen that if 

Cy„(x) = A„p(x)y„(x), (17.64) 

where C is an Hermitian operator, then the eigenvalues X n are real and the 
eigenfunctions y n (x) are orthogonal (or can be made so). Let us assume that we 
know the eigenfunctions y n (x) of C that individually satisfy (17.64) and some 
imposed boundary conditions (for which C is Hermitian). 

Now let us suppose we wish to solve the inhomogeneous differential equation 

Cy(x) = f(x), (17.65) 
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subject to the same boundary conditions. Since the eigenfunctions of C form a 
complete set, the full solution, y(x), to (17.65) may be written as a superposition 
of eigenfunctions, i.e. 

00 

T(x) = ^ c n y„(x), (17.66) 

n = 0 

for some choice of the constants c„. Making full use of the linearity of C, we have 

( 00 \ 00 00 

E c„y n (x ) ) = ^2 c n Cy n {x) = ^ c n A n p(x)y n (x). 
n=0 ) n= 0 n= 0 (17.67) 


Multiplying the first and last terms of (17.67) by y* and integrating, we obtain 


nb 00 rb 

/ y){z)f{z)dz =^2 c nA n yj(z)y„(z)p(z)dz, 

Ja n = 0 Ja 


(17.68) 


where we have used z as the integration variable for later convenience. Finally, 
using the orthogonality condition (17.28), we see that the integrals on the RHS 
are zero unless n = j, and so obtain 


1 X, fe .v»( z )/(")^ 

^ fa yl{z)y n {z)p(z)dz 


(17.69) 


Thus, if we can find all the eigenfunctions of a differential operator then (17.69) 
can be used to find the weighting coefficients for the superposition, to give as the 
full solution 


00 . 

y(x) = Ey 


f b a f n {z)f{z)dz 


^0 An fa y*n( z )yn(z)p(z) dz 


y„(x). 


(17.70) 


If the eigenfunctions have already been normalised, so that 


y* n {z)y„{z)p(z) dz = 1 


for all n, 


and we assume that we may interchange the order of summation and integration, 
then (17.70) can be written as 


y(x) = 




T -y n (x)y*(z) 


f(z)dz. 


The quantity in braces, which is a function of x and z only, is usually written 
G(x,z), and is the Green's function for the problem. With this notation, 


y(x)= f G(x,z)f(z)dz, (17.71) 
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where 

00 . 

G(x,z) = j-yn(x)y*„(z)- (17.72) 

n = 0 An 

We note that G(x,z ) is determined entirely by the boundary conditions and the 
eigenfunctions y n , and hence by C itself, and that f(z ) depends purely on the 
RHS of the inhomogeneous equation (17.65). Thus, for a given C and boundary 
conditions we can establish, once and for all, a function G(x,z ) that will enable 
us to solve the inhomogeneous equation for any RHS. From (17.72) we also note 
that 


G(x,z) = G*(z,x). (17.73) 

We have already met the Green’s function in the solution of second-order dif- 
ferential equations in chapter 15, as the function that satisfies the equation 
C[G{x,z)\ = 8(x — z) (and the boundary conditions). The formulation given 
above is an alternative, though equivalent, one. 


► Find an appropriate Green’s function for the equation 

y" + \y = fix), 

with boundary conditions y(0) = y(n) = 0. Hence, solve for (i) f(x) = sin2x and (ii) 
fix ) = x/2. 


One approach to solving this problem is to use the methods of chapter 15 and find 
a complementary function and particular integral. However, in order to illustrate the 
techniques developed in the present chapter we will use the superposition of eigenfunctions, 
which, as may easily be checked, produces the same solution. 

The operator on the LHS of this equation is already self-adjoint under the given 
boundary conditions, and so we seek its eigenfunctions. These satisfy the equation 

y" + \y = Ay- 

This equation has the familiar solution 

y(x) = A sin 1 — 2^ x + B cos \ — x. 


Now, the boundary conditions require that B = 0 and sin ^ \ — 2J n = 0, and so 
\J \ — 2 = n, where n = 0, +1, +2, 

Therefore, the independent eigenfunctions that satisfy the boundary conditions are 

y„(x) = A n sin nx, 

where n is any non-negative integer. The normalisation condition further requires 


/ Al sin 2 nxdx = 1 


A„ - f) 

Jo 


w 
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by 


Comparison with (17.72) shows that the appropriate Green's function is therefore given 




sin nx sin nz 


Case (i). Using (17.71), the solution with f(x) = sin2x is given by 


2 

y(x) = - 
n 


E mu r 

I 


sin nx sin nz 


sin 2 z dz = — j 


sin nz sin 2s dz. 


;i=0 4 


a ~ n Jo 


Now the integral is zero unless n = 2, in which case it is 


sin - 2s dz = 


Thus 

2 sin 2.x 7t 4 . „ 
yW = “7 t l574 2 = -l5 Sm2 " 

is the full solution for f(x) = sin2x. This is, of course, exactly the solution found by using 
the methods of chapter 15. 

Case (ii). The solution with /(x) = x/2 is given by 


y(x) = 


z, sin / 
7 T ' i 


sin nx sin nz 


4 OU „ Tl 

z 1 sinnx / 


s sin nz dz. 


The integral may be evaluated by integrating by parts, i.e. 


/' 


z sin nz dz = 


■ dz 


—n cos nn 
n 

*(-!)" 


n 


For n = 0 the integral is zero, and thus 

00 

y(x) = XF 1)n 


is the full solution for f(x) = x/2. Using the methods of subsection 15.1.2 the solution 
is found to be y(x) = 2x — 27tsin(x/2), which may be shown to be equal to the above 
solution by expanding 2x — 2n sin(x/2) as a Fourier sine series. ◄ 

A useful relation between the eigenfunctions of C is given by writing 


fi x ) = Z ' V »M / Tn( z )/ i z )pi z ) ^ 


= / /(z)p( z ) 2J yn(x)y*„(z)dz. 


and hence 


p(z)^>’„(x)y*(z) = 5(x-z). (17.74) 

n 
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This is called the completeness or closure property of the eigenfunctions. It defines 
a complete set. If the spectrum of eigenvalues of C is anywhere continuous then 
the eigenfunction y„(x) must be treated as y(n, x) and an integration carried out 
over n. 

We also note that the RHS of (17.74) is a d-function and so is only non-zero 
when z = x; thus p(z) on the LHS can be replaced by p(x) if required, i.e. 

p(z)^2 y n (x)y*(z) = p(x)^2y„(x)y' n {z). (17.75) 


17.7 A useful generalisation 

Sometimes we encounter inhomogeneous equations of a form slightly more gen- 
eral than (17.1), given by 


Cy{x) - Xp(x)y(x) = /(x) (17.76) 

for some self-adjoint operator C, with y subject to the appropriate boundary 
conditions and X a given (i.e. fixed) constant. To solve this equation we expand 
y(x) and / (x) in terms of the eigenfunctions y n (x) of the operator £, which satisfy 


Cy n (x) = X n p{x)y n (x). 
Firstly, we expand /(x) as follows: 


00 pb 

/(*) = ^2yn( x ) y*(z)f(z)p(z)dz 

n = 0 

pb oo 

= / p(z)'52y n (x)y' n {z)f(z)dz. 
7 a „_n 


Using (17.75) this becomes 


pb °° 

f(x) = / p(x) Y yn(x)y* n (z)f(z)dz 

Ja n = 0 

oo 

= P(x)Yy»W / yn( z )f( z ) dz ■ 

(\ J a 


(17.77) 


(17.78) 


Next, we expand y(x) as y = c «>’n( x ) and seek the coefficients c„. Substi- 

tuting this and (17.78) in (17.76) we have 

00 °0 ,.b 

p(x) ^(A„ - X)c n y„(x) = p{x) Y y n (x) / y*(z)f(z)dz, 

n=0 n= 0 
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from which we find that 




n = 0 


h yi h 


Hence the solution of (17.76) is given by 

V'' 1 \ Th( x ) f h *, f h Tn( x )Tn( z ) ft u 

y = 2^ c »y«( x ) = 2^ ■v-— r / y n ( z )f( z ) dz = / i _ - f( z ) dz - 


n=0 


n = 0 «=0 

From this we may identify the Green’s function 

<**■*) = 

n = 0 

We note that if 1 = A„, i.e. if A equals one of the eigenvalues of C, then G(x,z) 
becomes infinite and this method runs into difficulty. No solution then exists 
unless the RHS of (17.76) satisfies the relation 

[ y* n (x)f(x)dx = 0. 

J a 

If the spectrum of eigenvalues of the operator C is anywhere continuous, the 
orthogonality and closure relationships of the eigenfunctions become 


/ y*„(x)y m (x)p{x) dx = S(n - m), 
J a 

P(X) 

/ yK z )y n (x)p(x) dn = S(x - z ). 
Jo 


Repeating the above analysis we then find that the Green’s function is given by 

y„(x)y*„(z) 


G(x,z) = 


A,, — A 


dn. 


17.8 Exercises 

17.1 By considering (h\h), where h = f + Ag with A real, prove that, for two functions 
/ and g, 

(f\f)(g\g) ^ ?[(/ Ig> + (gl/)] 2 - 

The function y(x) is real and positive for all x. Its Fourier cosine transform y c (k) 
is defined by 

/ OO 

y(x) cos (kx) dx, 

-OO 

and it is given that y c (0) = 1. Prove that 

,v c (2 k) > 2 \y c (k)] 2 - 1. 
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17.2 (a) Write the homogeneous Sturm-Liouville eigenvalue equation for which 

y(a) = y(b) = 0 as 

Sf(y\A) = (py')' + qy + Apy = 0, 

where p(x),q(x ) and p(x) are continuously differentiable functions. Show that 
if z(x) and F(x) satisfy C(z ; ).) = F(x) with z(a) = z(b) = 0 then 

y(x)F(x) dx = 0. 

(b) Demonstrate the validity of result (a) by direct calculation for the case in 
which p(x) = p(x) = 1, q(x) = 0, a = —1, b = 1 and z(x ) = 1 — x 2 . 

17.3 Consider the real eigenfunctions y„(x) of a Sturm-Liouville equation 

(py')’ + qy + Apy = 0, a < x < b 

in which p(x), q(x) and p(x) are continuously differentiable real functions and 
p(x) does not change sign in a < x < b. Take p(x) as positive throughout the 
interval, if necessary by changing the signs of all eigenvalues. For a < xi <xz < b, 
establish the identity 

r x i 

(A„ - A m ) / py„y,n dx = [y n p y'„, -y m p y' n } ^ . 

Deduce that if A n > A m then y„(x) must change sign between two successive zeroes 
of y m (x). (The reader may find it helpful to illustrate this result by sketching the 
first few eigenfunctions of the system y" + Ay = 0, with y(0) = y(n) = 0, and the 
Legendre polynomials P n (z) given in subsection 16.6.1 for n = 2, 3,4, 5.) 

17.4 (a) Show that the equation 

y" + a5(x)y + Ay = 0, 

with y(+n) = 0 and a real, has a set of eigenvalues A satisfying 

tan( 71^/2) = 

^ a 

(b) Investigate the conditions under which negative eigenvalues, A = —p 2 with p 
real, are possible. 

17.5 Express the hypergeometric equation 

(.x 2 — x)y" + [(1 + a. + jS)x — y]y' + a fiy = 0 

in Sturm-Liouville form, determining the conditions imposed on x and on the 
parameters a, /I and y by the boundary conditions and the allowed forms of 
weight function. 

17.6 (a) Find the solution of (1— x 2 )y"— 2xy'+by = f(x) valid in the range — 1 < x < 1 

and finite at x = 0, in terms of Legendre polynomials. 

(b) If b = 14 and f(x) = 5x 3 , find the explicit solution and verify it by direct 
substitution. 

17.7 Use the generating function for the Legendre polynomials P„(x) to show that 

[ P2„+l(x) dx = (- 1 )" „ + /"' 1> , TTT 

Jo 2- n+1 nl(n + 1)! 

and that, except for the case n = 0, 

P 2 jj(x ) dx = 0. 
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17.8 


17.9 


The quantum mechanical wavefunction for a one-dimensional simple harmonic 
oscillator in its nth energy level is of the form 

ip(x) = exp(— x 2 /2)H„(x), 

where H„(x) is the nth Hermite polynomial. The generating function for the 
polynomials (17.53) is 

G(x,h) = <? hx ~ hl = V h n . 

z — ' n! 


(a) Find H,(x) for i = 1,2, 3,4. 

(b) Evaluate by direct calculation 



_ 2 

e~ x ~ H p (x)H q (x) dx, 


(i) for p = 2, q = 3; (ii) for p = 2, q = 4; (iii) for p = q = 3. Check your 
answers against equation (17.52). (You will find it convenient to use 


for integer n > 0.) 



(2h)!^/tt 

2 2 "n! 


The Laguerre polynomials, which are required for the quantum mechanical 
description of the hydrogen atom, can be defined by the generating function 
(equation (17.58)) 


G(x, h) = 


e -hx/(l-h) 

1 — h 


V L ” M h n 

n\ 

n=0 


By differentiating the equation separately with respect to x and /i, and re- 
substituting for G(x, h), prove that L„ and L', (= dL„(x)/dx) satisfy the recurrence 
relations 


K - nL '„-i + nL n-i = 0. 

(2 n -)- 1 x )L„ -)- yi L n — ^ — 0. 

From these two equations and others derived from them, show that L„(x) satisfies 
the Laguerre equation 

xL" n + ( 1 — x ) L' n + nL„ = 0. 


17.10 


17.11 


17.12 


Starting from the linearly independent functions 1, x, x 2 , x 3 , . . . , in the range 
0 < x < oo, find the first three orthogonal functions 4> o, cj > i and (f> 2 , with respect 
to the weight function p(x) = e~ x . By comparing your answers with the Laguerre 
polynomials generated by the recurrence relation derived in exercise 17.9, deduce 
the form of ^> 3 (x). 

Consider the set of functions {/(x)} of the real variable x, defined in the interval 
—00 < x < 00 , that — > 0 at least as quickly as x _1 as x — > + 00 . For unit weight 
function, determine whether each of the following linear operators is Hermitian 
when acting upon {/ (x)} : 


d .. . d , . . . d 

(a) — — +x; (b) — i—~ + x ; (c)ix— ; 

dx dx dx 

The Chebyshev polynomials T„(x) can be written as 

T n (x) = cosfncos x). 


(d) i 


d*_ 
dx 3 
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17.13 


17.14 


17.15 


17.16 


(a) Verify that these functions do satisfy the Chebyshev equation. 

(b) Use de Moivre's theorem to show that an alternative expression is 


t „( x )= 53 (-ir 


(n — r)lr\ 


l-.x 2 ) r/2 . 


A particle moves in a parabolic potential in which its natural angular frequency 
of oscillation is 1/2. At time f = 0 it passes through the origin with velocity v 
and is suddenly subjected to an additional acceleration of +1 for 0 < f < n/2, 
and then — 1 for k/ 2 < t < n. At the end of this period it is at the origin again. 
Apply the results of the worked example in section 17.6 to show that 


8 y, i 

n^ 0 (4m + 2f-l 


-0.81. 


Find an eigenfunction expansion for the solution with boundary conditions 
y(0) = y(n) = 0 of the inhomogeneous equation 

d~y 

-^+ K y = f(x), 

where k is a constant and 


m 


x, 0 < x < n/2 , 
n — x, n/2 < x < n. 


(a) Find those eigenfunctions y„(x) of the self-adjoint linear differential operator 
d 2 /dx 2 that satisfy the boundary conditions y„(0) = y„(n) = 0, and hence 
construct its Green’s function G(x, z). 

(b) Construct the same Green's function using the methods of subsection 15.2.5, 
showing that it is 

„ , I x(z — 7t)/7t, 0 < X < Z, 

G(x,z)=< 

I z(x — n)/n, z < x < n. 


(c) By expanding the function given in (b) in terms of the eigenfunctions y„(x), 
verify that it is the same function as that derived in (a). 

(a) The differential operator ££ is defined by 


Cy 


d 

dx 



e x y 

~A' 


Determine the eigenvalues of the problem 


= 2 n e x y„ 0 < x < 1, 


with boundary conditions 

y(0) = 0, ‘^1 + 1= 0 at x = 1. 

dx 2 


(b) Find the corresponding unnormalised y„, and also a weight function p(x) with 
respect to which the y n are orthogonal. Hence, select a suitable normalisation 
for the y„. 

(c) By making an eigenfunction expansion, solve the equation 

<£y = - e x/2 , 0 < x < 1, 


subject to the same boundary conditions as previously. 
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17.17 Show that the linear operator 

= z(l +x 2 ) 2 -^ + 4x(l T x 2 ) -j— + a, 
dx z ax 

acting upon functions defined in — 1 < x < 1 and vanishing at the endpoints of 
the interval, is Hermitian with respect to the weight function (1 + x 2 ) -1 . 

By making the change of variable x = tan(0/2), find two even eigenfunctions, 
fi(x) and / 2 (x), of the differential equation 

F£u = Xu. 

17.18 By substituting x = expf find the normalized eigenfunctions y„(x) and the 
eigenvalues X n of the operator C defined by 

Cy = x 2 y" + 2 xy' + \y, 1 < x < e, 

with y( 1) = y(e) = 0. Find, as a series a„y n (x), the solution of Cy = x~ l/1 . 

17.19 Express the solution of Poisson's equation in electrostatics, 

V 2 <£(r) = -p(r)/e 0 , 

where p is the non-zero charge density over a finite part of space, in the form of 
an integral and hence identify the Green’s function for the V 2 operator. 

17.20 In the quantum mechanical study of the scattering of a particle by a potential, 
a Born-approximation solution can be obtained in terms of a function y( r) that 
satisfies an equation of the form 

( — V 2 — K 2 )y(r) = F(r). 

Assuming that yk(r) = (27i) 3/2 exp(/k • r) is a suitably normalised eigenfunction of 
—V 2 corresponding to eigenvalue — k 2 , find a suitable Green’s function G K (r,r'). 
By taking the direction of the vector r r' as the polar axis for a k-space 
integration, show that G K (r,r') can be reduced to 

1 r°°wsmw 
4rc|r-r'| i_ tX) w 2 -w 5 W ’ 

where wo = K\r — r'|. 

(This integral can be evaluated using a contour integration (chapter 20) to give 
(47i|r — r'lr 1 exp(/K|r — r'|).) 


17.9 Hints and answers 

17.1 Express the condition (h\h) > 0 as a quadratic equation in X and then apply the 
condition for no real roots, noting that (/|g) + (g|/) is real. To put a limit on 
f y cos 2 kx dx, set / = y 1/2 cosfcx and g = y 1/2 in the inequality. 

17.2 (a) By twice integrating by parts the term containing p , show that 
jj" y£C(z\X)dx = z£C(y;X)dx. 

(b) y(x) = Acos(^JXx) with X = n 2 n 2 /A, and F(x) = X — 2 — Xx 2 . 

17.3 Follow an argument similar to that in subsection 17.3.1, but integrate from x\ to 
X 2 , rather than from a to b. Take x t and X 2 as two successive zeroes of y m (x) and 
note that, if the sign of y m is a then the sign of y' m (x i) is ot whilst that of y' m (x 2 ) 
is —a. Now assume that y„(x) does not change sign in the interval and has a 
constant sign /( ; show that this leads to a contradiction between the signs of the 
two sides of the identity. 

17.4 (a) Different combinations of sinusoids are needed for negative and positive 
ranges of x. (b) p must satisfy tanh/(7i = 2 p/a, which requires a > 2/n. 
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17.5 

17.6 


17.8 

17.10 

17.11 

17.14 

17.15 

17.16 


17.17 

17.18 


17.19 


[x J (l - x)“ + ^- v+iyy = apx , ~ 1 {l - xV+^y; 0 < x < 1, <x + /? > y > 1. 
(a) y = E a„P,,{x) with 


1/2 


b — n(n + 1 ) 


f(z)P n (z)dz; 


(b) 5x 3 = 2P i (x)+3P t (x), giving cii = 1/4 and a 3 = 1, leading to y = 5(2x 3 — x)/4. 
(a) 2.x, 4x 2 -2, 8x 3 - 12x, 16x 4 - 48x 2 + 12; (b) (i) 0, (ii) 0, (iii) 48 ^/tt. 

</> o(x) = 1, </>i(x) = x — 1, <j> 2 <x) = (x 2 — 4x + 2)/2; n!</>„(x) = (— l)"L„(x); 

</>3 (x) = (x 3 — 9x 2 + 18x — 6) / 6. 

(a) No, f gf'dx ft 0; (b) yes; (c) no, i f f'gdx ft 0; (d) yes. 

The normalised eigenfunctions are (2/n) l/2 sinnx, with n an integer. 
y(x) = (4/71) E„ oddK^ 1 )'"” 1 ' 72 sin nx\/[n 2 (K-n 2 )]. 

(a) The normalised eigenfunctions are {2/n) 112 sinnx, with n an integer. 

G(x,z) = (— 2 / 71 ) E^otsinf' 12 ) sin(nx)]/n 2 . 

(a) = {n + \/2) 2 n 2 , n = 0 , 1 , 2 ,... . 

(b) Since y„(l)y'„(l) ft 0, the Sturm-Liouville boundary conditions are not sat- 
isfied and the appropriate weight function has to be justified by inspection. The 
normalised eigenfunctions are ^J2e~ xl2 sin[(n + l/2)7ix], with p(x) = e x . 

(c) y(x) = (—2/ 7t 3 ) E!^=o e ~ x ^ 2 sin[(» + l/2)7tx]/(n + 1/2) 3 . 

In terms of 6, is d 2 /dd 2 + a and has eigenfunctions u(9) = cos (y/a — W), where 
y/ a — X = 2n + 1 ; 

f 1 (x) = (1 — x 2 )/(l +x 2 );/ 2 (x) = 4 [( 1 — x 2 )/ ( 1 T x 2 )] 3 3 [( 1 x 2 )/(l +x 2 )]. 

y„(x) = y[2x~ 112 sin(;!7rlnx) with 2„ = —n 2 n 2 ; 


d n 


—(nn) 2 fi y/ 2x 1 sin(n7t In x) dx = — y/S(nn) 3 for n odd, 
0 for n even. 


G(r, r') = (47i|r — r'|) _1 . 
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18 


Partial differential equations: 
general and particular solutions 


In this chapter and the next the solution of differential equations of types 
typically encountered in the physical sciences and engineering is extended to 
situations involving more than one independent variable. A partial differential 
equation (PDE) is an equation relating an unknown function (the dependent 
variable) of two or more variables to its partial derivatives with respect to 
those variables. The most commonly occurring independent variables are those 
describing position and time, and so we will couch our discussion and examples 
in notation appropriate to them. 

As in other chapters we will focus our attention on the equations that arise 
most often in physical situations. We will restrict our discussion, therefore, to 
linear PDEs, i.e. those of first degree in the dependent variable. Furthermore, we 
will discuss primarily second-order equations. The solution of first-order PDEs 
will necessarily be involved in treating these, and some of the methods discussed 
can be extended without difficulty to third- and higher-order equations. We shall 
also see that many ideas developed for ordinary differential equations (ODEs) 
can be carried over directly into the study of PDEs. 

In this chapter we will concentrate on general solutions of PDEs in terms 
of arbitrary functions and the particular solutions that may be derived from 
them in the presence of boundary conditions. We also discuss the existence and 
uniqueness of the solutions to PDEs under given boundary conditions. 

In the next chapter the methods most commonly used in practice for obtaining 
solutions to PDEs subject to given boundary conditions will be considered. These 
methods include the separation of variables, integral transforms and Green’s 
functions. This division of material is rather arbitrary and really has been made 
only to emphasise the general usefulness of the latter methods. In particular, it 
will be readily apparent that some of the results of the present chapter are in 
fact solutions in the form of separated variables, but arrived at by a different 
approach. 
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18.1 IMPORTANT PARTIAL DIFFERENTIAL EQUATIONS 


18.1 Important partial differential equations 

Most of the important PDEs of physics are second-order and linear. In order to 
gain familiarity with their general form, some of the more important ones will 
now be briefly discussed. These equations apply to a wide variety of different 
physical systems. 

Since, in general, the PDEs listed below describe three-dimensional situations, 
the independent variables are r and f, where r is the position vector and f is 
time. The actual variables used to specify the position vector r are dictated by the 
coordinate system in use. For example, in Cartesian coordinates the independent 
variables of position are x, y and z, whereas in spherical polar coordinates they 
are r, 9 and </>. The equations may be written in a coordinate-independent manner, 
however, by the use of the Laplacian operator V 2 . 


18.1.1 The wave equation 


The wave equation 


V 2 u = 


1 8 2 u 
c 2 6t 2 


( 18 . 1 ) 


describes as a function of position and time the displacement from equilibrium, 
«( r, t), of a vibrating string or membrane or a vibrating solid, gas or liquid. The 
equation also occurs in electromagnetism, where u may be a component of the 
electric or magnetic field in an elecromagnetic wave or the current or voltage 
along a transmission line. The quantity c is the speed of propagation of the waves. 


► Find the equation satisfied by small transverse displacements u(x, t) of a uniform string of 
mass per unit length p held under a uniform tension T, assuming that the string is initially 
located along the x-axis in a Cartesian coordinate system. 


Figure 18.1 shows the forces acting on an elemental length As of the string. If the tension 
T in the string is uniform along its length then the net upward vertical force on the 
element is 

AF = T sin (L — T sindi. 


Assuming that the angles and 02 are both small, we may make the approximation 
sin0 « tanf). Since at any point on the string the slope tand = du/dx, the force can be 
written 


A F 


T 


8u(x + Ax, t) 
8x 


du(x, t) 
dx 


8 2 u(x,t ) 
8x 2 


Ax, 


where we have used the definition of the partial derivative to simplify the RHS. 

This upward force may be equated, by Newton’s second law, to the product of the 
mass of the element and its upward acceleration. The element has a mass p As, which is 
approximately equal to p Ax if the vibrations of the string are small, and so we have 


p Ax 


8 2 u(x,t ) 8 2 u(x,t) 

8t 2 = 8x 2 
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Figure 18.1 The forces acting on an element of a string under uniform 
tension T. 


Dividing both sides by Ax we obtain, for the vibrations of the string, the one-dimensional 
wave equation 

8 2 u 1 d 2 u 
dx 2 c 2 8t 2 ’ 

where c 2 = T / p. ◄ 


The longitudinal vibrations of an elastic rod obey a very similar equation to 
that derived in the above example, namely 


8 2 u p d 2 u 
dx 2 E dt 2 ’ 


here p is the mass per unit volume and E is Young’s modulus. 

The wave equation can be generalised slightly. For example, in the case of the 
vibrating string, there could also be an external upward vertical force / (x, f) per 
unit length acting on the string at time f. The transverse vibrations would then 
satisfy the equation 


8 2 u 

T 

dx 2 


+ f(x, t) 


8 2 u 


= P 


dt 2 ’ 


which is clearly of the form ‘upward force per unit length = mass per unit length 
x upward acceleration’. 

Similar examples, but involving two or three spatial dimensions rather than one, 
are provided by the equation governing the transverse vibrations of a stretched 
membrane subject to an external vertical force density f(x,y,t). 


T 




+ f(x,y,t) = p{x,y) 


d 2 u 

~8f’ 


where p is the mass per unit area of the membrane and T is the tension. 
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18.1.2 The diffusion equation 

The diffusion equation 

k -V 2 u=^ M (18.2) 

at 

describes the temperature u in a region containing no heat sources or sinks; it 
also applies to the diffusion of a chemical that has a concentration u(r, t). The 
constant k is called the diffusivity. The equation is clearly second-order in the 
three spatial variables, but first order in time. 


► Derive the equation satisfied by the temperature u(r,t) at time t for a material of uniform 
thermal conductivity k, specific heat capacity s and density p. Express the equation in 
Cartesian coordinates. 


Let us consider an arbitrary volume V lying within the solid and bounded by a surface S 
(this may coincide with the surface of the solid if so desired). At any point in the solid 
the rate of heat flow per unit area in any given direction r is proportional to minus the 
component of the temperature gradient in that direction and so is given by (—kVu) • r. The 
total flux of heat out of the volume V per unit time is given by 


dQ 

dt 


II 

III 


(— kVu ) ■ hdS 
V • ( —kVu)dV , 


(18.3) 


where Q is the total heat energy in V at time t and n is the outward-pointing unit normal 
to S ; note that we have used the divergence theorem to convert the surface integral into 
a volume integral. 

We can also express Q as a volume integral over V, 


Q = 


III 


spudV , 


and its rate of change is then given by 


dQ 

dt 



(18.4) 


where we have taken the derivative with respect to time inside the integral (see section 5.12). 

Comparing (18.3) and (18.4), and remembering that the volume V is arbitrary, we obtain 
the three-dimensional diffusion equation 


where the diffusion coefficient k = k/(sp). To express this equation in Cartesian coordinates, 
we simply write V 2 in terms of x, y and z to obtain 


( 8 2 u 8 2 u 8 2 u\ 8u 

k<3x 2 8y 2 8z 2 J 8t' * 


The diffusion equation just derived can be generalised to 

S U 

kV 2 u + /( r, t) = sp — . 
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The second term, /(r, t), represents a varying density of heat sources throughout 
the material but is often not required in physical applications. In the most general 
case, k, s and p may depend on position r, in which case the first term becomes 
V • (kVu). However, in the simplest application the heat flow is one-dimensional 
with no heat sources, and the equation becomes (in Cartesian coordinates) 

8 2 u sp 8u 
dx 2 k dt 


18.1.3 Laplace's equation 

Laplace’s equation, 

V 2 u = 0, (18.5) 

may be obtained by setting du/dt = 0 in the diffusion equation (18.2), and 
describes (for example) the steady-state temperature distribution in a solid in 
which there are no heat sources - i.e. the temperature distribution after a long 
time has elapsed. 

Laplace’s equation also describes the gravitational potential in a region con- 
taining no matter or the electrostatic potential in a charge-free region. Further, it 
applies to the flow of an incompressible fluid with no sources, sinks or vortices; 
in this case u is the velocity potential, from which the velocity is given by v = Vn. 


Poisson’s equation. 


18.1.4 Poisson’s equation 


V 2 m = p( r), 


(18.6) 


describes the same physical situations as Laplace’s equation, but in regions 
containing matter, charges or sources of heat or fluid. The function p(r) is 
called the source density and in physical applications usually contains some 
multiplicative physical constants. For example, if u is the electrostatic potential 
in some region of space, in which case p is the density of electric charge, then 
V 2 m = —p( r)/eo, where eo is the permittivity of free space. Alternatively, u might 
represent the gravitational potential in some region where the matter density is 
given by p ; then V 2 w = 4nGp(r), where G is the gravitational constant. 


18.1.5 Schrodinger’s equation 


The Schrodinger equation 


h 2 8u 

— — V'u + V(r)u = ih — , 
2m at 


(18.7) 
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describes the quantum mechanical wavefunction u{ r, f) of a non-relativistic particle 
of mass m; H is Planck’s constant divided by 2n. Like the diffusion equation it is 
second order in the three spatial variables and first order in time. 


18.2 General form of solution 


Before turning to the methods by which we may hope to solve PDEs such as 
those listed in the previous section, it is instructive, as for ODEs in chapter 14, to 
study how PDEs may be formed from a set of possible solutions. Such a study 
can provide an indication of how equations obtained not from possible solutions 
but from physical arguments might be solved. 

For definiteness let us suppose we have a set of functions involving two 
independent variables x and y. Without further specification this is of course a 
very wide set of functions, and we could not expect to find a useful equation that 
they all satisfy. However, let us consider a type of function w,(x, y) in which x and 
y appear in a particular way, such that can be written as a function (however 
complicated) of a single variable p, itself a simple function of x and y. 

Let us illustrate this by considering the three functions 


«i(x,y) = x 4 + 4(x 2 y + y 2 + 1), 
u 2 (x, y) = sin x 2 cos 2_y + cos x 2 sin 2 y. 


u 2 (x,y) = 


x 2 + 2y + 2 
3x 2 + 6y + 5 


These are all fairly complicated functions of x and y and a single differential 
equation of which each one is a solution is not obvious. However, if we observe 
that in fact each can be expressed as a function of the variable p = x 2 + 2y alone 
(with no other x or y involved) then a great simplification takes place. Written 
in terms of p the above equations become 


m(x,y) = (x 2 + 2v ) 2 + 4 = p 1 + 4 = f { (p), 
u 2 (x,y) = sin(x 2 + 2 y) = sinp = f 2 {p), 


u 2 (x,y) = 


(x 2 + 2y) + 2 p + 2 


3(x 2 + 2 y) + 5 3p + 5 




Let us now form, for each u,-, the partial derivatives dui/dx and dui/dy. In each 
case these are (writing both the form for general p and the one appropriate to 
our particular case, p = x 2 + 2 y) 


duj 

ox 

Out 

¥ 


dfjjp) dp 
dp dx 
dfi(p ) dp 


= 2 xf', 
= 2 


for i = 


dp dy 

1, 2, 3. All reference to the form of /, can be eliminated from these 
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equations by cross-multiplication, obtaining 


dp duj dp duj 
dy dx dx dy ’ 


or, for our specific form, p = x 1 + 2 y, 


duj dip 
dx % dy ’ 


(18.8) 


It is thus apparent that not only are the three functions tq, u 2 if 3 solutions of the 
PDE (18.8) but so also is any arbitrary function f(p) of which the argument p has 
the form x 2 + 2 y. 


18.3 General and particular solutions 

In the last section we found that the first-order PDE (18.8) has as a solution any 
function of the variable x 2 + 2 y. This points the way for the solution of PDEs 
of other orders, as follows. It is not generally true that an nth-order PDE can 
always be considered as resulting from the elimination of n arbitrary functions 
from its solution (as opposed to the elimination of n arbitrary constants for an 
nth-order ODE, see section 14.1). However, given specific PDEs we can try to 
solve them by seeking combinations of variables in terms of which the solutions 
may be expressed as arbitrary functions. Where this is possible we may expect n 
combinations to be involved in the solution. 

Naturally, the exact functional form of the solution for any particular situation 
must be determined by some set of boundary conditions. For instance, if the PDE 
contains two independent variables x and y then for complete determination of 
its solution the boundary conditions will take a form equivalent to specifying 
u(x,y) along a suitable continuum of points in the .xy-plane (usually along a line). 

We now discuss the general and particular solutions of first- and second- 
order PDEs. In order to simplify the algebra, we will restrict our discussion 
to equations containing just two independent variables x and y. Nevertheless, 
the method presented below may be extended to equations containing several 
independent variables. 


18.3.1 First-order equations 

Although most of the PDEs encountered in physical contexts are second order 
(i.e. they contain d 2 u/dx 2 or d 2 u/dxdy, etc.), we now discuss first-order equations 
to illustrate the general considerations involved in the form of the solution and 
in satisfying any boundary conditions on the solution. 

The most general first-order linear PDE (containing two independent variables) 
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is of the form 

du du 

A(x,y) — + B(x,y)— + C(x,y)u = R(x,y), (18.9) 

where A(x,y), B(x,y), C(x,y) and R(x,y ) are given functions. Clearly, if either 
A(x,y) or B(x,y) is zero then the PDE may be solved straightforwardly as a 
first-order linear ODE (as discussed in chapter 14), the only modification being 
that the arbitrary constant of integration becomes an arbitrary function of x or y 
respectively. 


► Find the general solution u(x,y ) of 


8u 2 

x- 1- 3a = xr. 

ox 


Dividing through by x we obtain 


8u 3 u 

— + — = x, 

OX X 


which is a linear equation with integrating factor (see subsection 14.2.4) 


exp 


(/H 


= exp(3 lnx) = x 3 . 


Multiplying through by this factor we find 


~^~(x 3 u) = x 4 , 
ox 


which, on integrating with respect to x, gives 


x 3 u = — + fly), 


where f(y) is an arbitrary function of y. Finally, dividing through by x 3 , we obtain the 
solution 

, , x 2 f{y) 

u(x,y) = — + — r-. ◄ 


When the PDE contains partial derivatives with respect to both independent 
variables then, of course, we cannot employ the above procedure but must seek 
an alternative method. Let us for the moment restrict our attention to the special 
case in which C(x,y) = R(x,y) = 0 and, following the discussion of the previous 
section, look for solutions of the form u(x,y) = f(p) where p is some, at present 
unknown, combination of x and y. We then have 

8u df{p)8p 
dx dp dx’ 

8u df(p)dp 
8y dp dy’ 
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which, when substituted into the PDE ( 18.9), give 


A(x,y) 


dp 

ex 


B(x,y) 


8p df(p) 
dy J dp 


= 0 . 


This removes all reference to the actual form of the function f(p) since for 
non-trivial p we must have 


A(x,y) — + B(x,y) — = 0. (18.10) 

ox dy 

Let us now consider the necessary condition for f(p) to remain constant as x 
and y vary; this is that p itself remains constant. Thus for / to remain constant 
implies that x and y must vary in such a way that 


dp = d ldx+ d ^-dy = 0. 
ox cy 


(18.11) 


The forms of (18.10) and (18.11) are very alike, and become the same if we 
require that 


dx dy 

A(x,y) B(x,y) 


(18.12) 


By integrating this expression the form of p can be found. 


►For 



~ du 

-2y— = 0, 
dy 


(18.13) 


find (i) the solution that takes the value 2y + 1 on the line x = 1, and (ii) a solution that 
has the value 4 at the point (1,1). 


If we seek a solution of the form u(x,y) = f(p), we deduce from (18.12) that u(x,y) will 
be constant along lines of (x, y) that satisfy 

dx dy 

x —2 y’ 

which on integrating gives x = cy~ 1/2 . Identifying the constant of integration c with p 1/2 
(to avoid fractional powers), we conclude that p = x 2 y. Thus the general solution of the 
PDE (18.13) is 

w(x,y) = f(x 2 y), 

where / is an arbitrary function. 

We must now find the particular solutions that obey each of the imposed boundary 
conditions. For boundary condition (i) a little thought shows that the particular solution 
required is 

u(x,y ) = 2(x 2 y) + 1 = 2x 2 y + 1. (18.14) 

For boundary condition (ii) some obviously acceptable solutions are 

u(x,y ) = x 2 y + 3, 
u(x,y ) = 4 x 2 y, 
u(x,y ) = 4. 
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Each is a valid solution (the freedom of choice of form arises from the fact that u 
is specified at only one point (1,1), and not along a continuum (say), as in boundary 
condition (i)). All three are particular examples of the general solution, which may be 
written, for example, as 

u{x,y) = x 2 y + 3+g(x 2 y), 

where g = g(x 2 y) = g(p) is an arbitrary function subject only to g(l) = 0. For this 
example, the forms of g corresponding to the particular solutions listed above are g(p) = 0, 
g(p) = 3p — 3, g(p) = 1-p. ◄ 

As mentioned above, in order to find a solution of the form u(x,y) = f(p ) we 
require that the original PDE contains no term in w, but only terms containing 
its partial derivatives. If a term in u is present, so that C(x,y) f= 0 in (18.9), 
then the procedure needs some modification, since we cannot simply divide out 
the dependence on f(p) to obtain (18.10). In such cases we look instead for 
a solution of the form u(x,y) = h(x,y)f(p). We illustrate this method in the 
following example. 


► Find the general solution of 


du „du 

x- — b 2- 2 u = 0. 

dx dy 

(18.15) 


We seek a solution of the form u(x,y ) = h(x,y)f(p), with the consequence that 


du 8h u > , ; 
Tx = Tx f(p) + h 

8u 8h fi 
Ty = Yy f(p) + h 


df(p) dp 
dp dx' 
df{p) dp 


dp dy’ 

Substituting these expressions into the PDE (18.15) and rearranging, we obtain 


dh . dh 
x— + 2—-2h 
ox dy 


f(p) + 


dp dp , , 
*/+2/ )h 
ox oy 


df{p) 

dp 


= 0 . 


The first factor in parentheses is just the original PDE with u replaced by h. Therefore, if 
h is any solution of the PDE, however simple, this term will vanish, to leave 




Jf(p) 

dp 


= 0 , 


from which, as in the previous case, we obtain 




= 0 . 


From (18.11) and (18.12) we see that u(x,y ) will be constant along lines of (x, y) that 
satisfy 

dx dy 
x 2 ’ 


which integrates to give x = cexp(v/2). Identifying the constant of integration c with p we 
find p = xexp(— y/2). Thus the general solution of (18.15) is 

u(x,y) = h(x,y)f(xexp(-\y)), 

where f(p) is any arbitrary function of p and h(x,y) is any solution of (18.15). 
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If we take, for example, h(x,y) = exp y, which clearly satisfies (18.15), then the general 
solution is 

u{x,y) = (expy)/(xexp(-ly)). 

Alternatively, h(x,y ) = x 2 also satisfies (18.15) and so the general solution to the equation 
can also be written 

u(x,y) = x 2 g(xexp(— jv)), 

where g is an arbitrary function of p ; clearly g (p) = f(p)/p 2 . ◄ 


18.3.2 Inhomogeneous equations and problems 

Let us discuss in a more general form the particular solutions of (18.13) found 
in the second example of the previous subsection. It is clear that, so far as this 
equation is concerned, if u(x,y) is a solution then so is any multiple of u(x,y ) or 
any linear sum of separate solutions tq(x,y) + u 2 {x,y). However, when it comes 
to fitting the boundary conditions this is not so. 

For example, although u(x,y) in (18.14) satisfies the PDE and the boundary 
condition «(l,y) = 2 y + 1, the function wi(x,y) = 4 u(x,y) = 8xy + 4, whilst 
satisfying the PDE, takes the value 8y + 4 on the line x = 1 and so does not satisfy 
the required boundary condition. Likewise the function u 2 {x,y) = u(x,y)+fi(x 2 y), 
for arbitrary / 1 , satisfies (18.13) but takes the value u 2 (\,y) = 2y + 1 +/i(y) on 
the line x = 1, and so is not of the required form unless / 1 is identically zero. 

Thus we see that when treating the superposition of solutions of PDEs two 
considerations arise, one concerning the equation itself and the other connected 
to the boundary conditions. The equation is said to be homogeneous if the fact 
that u(x,y) is a solution implies that Xu(x, y), for any constant X, is also a solution. 
However, the problem is said to be homogeneous if, in addition, the boundary 
conditions are such that if they are satisfied by m(.x, y) then they are also satisfied 
by Xu(x,y). The last requirement itself is referred to as that of homogeneous 
boundary conditions. 

For example, the PDE (18.13) is homogeneous but the general first-order 
equation (18.9) would not be homogeneous unless R(x,y) = 0. Furthermore, 
the boundary condition (i) imposed on the solution of (18.13) in the previous 
subsection is not homogeneous though, in this case, the boundary condition 

m(x, y) = 0 on the line y = 4x -2 

would be, since u(x,y) = X(x 2 y — 4) satisfies this condition for any X and, being a 
function of x 2 y, satisfies (18.13). 

The reason for discussing the homogeneity of PDEs and their boundary condi- 
tions is that in linear PDEs there is a close parallel to the complementary-function 
and particular-integral property of ODEs. The general solution of an inhomo- 
geneous problem can be written as the sum of any particular solution of the 
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problem and the general solution of the corresponding homogeneous problem (as 
for ODEs, we require that the particular solution is not already contained in the 
general solution of the homogeneous problem). Thus, for example, the general 
solution of 

| - — x^- + au = f(x,y), (18.16) 

ox cy 

subject to, say, the boundary condition u(0,y) — g(y), is given by 


u{x,y) = v(x,y) + w(x,y), 


where v(x,y) is any solution (however simple) of (18.16) such that r(0, y) = g(y) 
and w(x,y) is the general solution of 


dw dw 
— 

dx dy 


+ aw 


0 , 


(18.17) 


with w(0, v) = 0. If the boundary conditions are sufficiently specified then the only 
possible solution of (18.17) will be w(x,y) = 0 and v{x,y) will be the complete 
solution by itself. 

Alternatively, we may begin by finding the general solution of the inhomoge- 
neous equation (18.16) without regard for any boundary conditions; it is just the 
sum of the general solution to the homogeneous equation and a particular inte- 
gral of (18.16), both without reference to the boundary conditions. The boundary 
conditions can then be used to find the appropriate particular solution from the 
general solution. 

We will not discuss at length general methods of obtaining particular integrals 
of PDEs but merely note that some of those methods available for ordinary 
differential equations can be suitably extended.! 


► Find the general solution of 


du du 



= 3x. 


(18.18) 


Hence find the most general particular solution ( i ) which satisfies u(x, 0) = x 2 and ( ii ) 
which has the value u(x,y ) = 2 at the point (1,0). 


This equation is inhomogeneous, and so let us first find the general solution of (18.18) 
without regard for any boundary conditions. We begin by looking for the solution of the 
corresponding homogeneous equation ((18.18) with the RHS equal to zero) of the form 
u(x,y ) = f(p). Following the same procedure as that used in the solution of (18.13) we 
find that u(x,y ) will be constant along lines of (x, y) that satisfy 


dx dy 
y —x 



= c. 


Identifying the constant of integration c with p/2, we find that the general solution of the 


f See for example Piaggio. Differential Equations (Bell, 1954), p. 175 et seq. 
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homogeneous equation is u(x, y) = f(x 2 + y 2 ) for arbitrary function /. Now by inspection 
a particular integral of (18.18) is u(x,y) = —3 y, and so the general solution to (18.18) is 

u(x,y) = f(x 2 + y 2 ) - 3y. 

Boundary condition (i) requires u(x, 0) = f(x 2 ) = x 2 , i.e. f(z) = z, and so the particular 
solution in this case is 

u(x,y) = x 2 + y 2 — 3 y. 

Similarly, boundary condition (ii) requires w(l,0) = /( 1) = 2. One possibility is f(z) = 2 z, 
and if we make this choice, then one way of writing the most general particular solution 
is 

u(x,y) = lx 2 + 2 y 2 — 3y + g(x 2 + y 2 ), 

where g is any arbitrary function for which g(l) = 0. Alternatively, a simpler choice would 
be f(z) = 2, leading to 

u(x,y) = 2 — 3 y + g(x 2 + y 2 ). ◄ 

Although we have discussed the solution of inhomogeneous problems only 
for first-order equations, the general considerations hold true for linear PDEs of 
higher order. 


18.3.3 Second-order equations 


As noted in section 18.1, second-order linear PDEs are of great importance in 
describing the behaviour of many physical systems. As in our discussion of first- 
order equations, for the moment we shall restrict our discussion to equations with 
just two independent variables; extensions to a greater number of independent 
variables are straightforward. 

The most general second-order linear PDE (containing two independent vari- 
ables) has the form 


d 2 u d 2 u 

8x 2 dxdy 



„ du dw 

+ D— + E— + Fu = R(x, y), 

ox cy 


(18.19) 


where A, B F and R(x, y) are given functions of x and y. Because of the nature 

of the solutions to such equations, they are usually divided into three classes, a 
division of which we will make further use in subsection 18.6.2. The equation 
(18.19) is called hyperbolic if B 2 > 4 AC, parabolic if B 2 = 4 AC and elliptic if 
B 2 < 4 AC. Clearly, if A, B and C are functions of x and y (rather than just 
constants) then the equation might be of different types in different parts of the 
xy-plane. 

Equation (18.19) obviously represents a very large class of PDEs, and it is 
usually impossible to find closed-form solutions to most of these equations. 
Therefore, for the moment we shall consider only homogeneous equations, with 
R(x,y) = 0, and make the further (greatly simplifying) restriction that, throughout 

the remainder of this section, A , B F are not functions of x and y but merely 

constants. 
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We now tackle the problem of solving some types of second-order PDE with 
constant coefficients by seeking solutions that are arbitrary functions of particular 
combinations of independent variables, just as we did for first-order equations. 

Following the discussion of the previous section, we can hope to find such 
solutions only if all the terms of the equation involve the same total number 
of differentiations, i.e. all terms are of the same order, although the number 
of differentiations with respect to the individual independent variables may be 
different. This means that in (18.19) we require the constants D , E and F to be 
identically zero (we have, of course, already assumed that R(x,y) is zero), so that 
we are now considering only equations of the form 


d 2 u d 2 u r d 2 u 

8x 2 8x8y 8y 2 


= 0 , 


(18.20) 


where A, B and C are constants. We note that both the one-dimensional wave 
equation, 

8 2 u 1 8 2 u 

8X 2 ~~c 2 W = °’ 

and the two-dimensional Laplace equation, 


8 2 u 8 2 u 

8x 2 8y 2 

are of this form, but that the diffusion equation, 

8 2 u 8u 

is not, since it contains a first-order derivative. 

Since all the terms in (18.20) involve two differentiations, by assuming a solution 
of the form u(x,y) = f(p), where p is some unknown function of x and y (or t), 
we may be able to obtain a common factor d 2 f(p)/dp 2 as the only appearance of 
/ on the LHS. Then, because of the zero RHS, all reference to the form of / can 
be cancelled out. 

We can gain some guidance on suitable forms for the combination p = p(x,y ) 
by considering 8u/8x when u is given by u(x,y) = f(p), for then 


8u df(p ) 8p 
8x dp 8x 

Clearly differentiation of this equation with respect to x (or y) will not lead to a 
single term on the RHS, containing / only as d 2 f(p)/dp 2 , unless the factor 8p/8x 
is a constant so that 8 2 p/8x 2 and 8 2 p/8x8y are necessarily zero. This shows that 
p must be a linear function of x. In an exactly similar way p must also be a linear 
function of y, i.e. p = ax + by. 

If we assume a solution of (18.20) of the form u(x,y) = f(ax + by), and evaluate 
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the terms ready for substitution into (18.20), we obtain 

du df{p) du ,df(p) 
dx C 'dp dy dp 

d 2 u _ 2 d 2 f(p) d 2 u _ d 2 f (p) d 2 u _ 2 d 2 f(p ) 

dx 2 dp 2 ’ dxdy dp 2 ’ dy 2 dp 2 

which on substitution give 

(Aa 2 + Bab + Cb 2 ) = 0. (18.21) 

dp - 

This is the form we have been seeking, since now a solution independent of 
the form of / can be obtained if we require that a and b satisfy 

Aa 2 + Bab + Cb 2 — 0. 

From this quadratic, two values for the ratio of the two constants a and b are 
obtained, 

b/a = [-B ± (B 2 - 4AC) 1/2 ]/2C. 

If we denote these two ratios by A\ and A 2 then any functions of the two variables 
Pi = x + Ai y, p 2 = x + A 2 y 

will be solutions of the original equation (18.20). The omission of the constant 
factor a from pi and p 2 is of no consequence since this can always be absorbed 
into the particular form of any chosen function; only the relative weighting of x 
and y in p is important. 

Since p\ and p 2 are in general different, we can thus write the general solution 
of (18.20) as 

u(x,y) = f(x + Ai y) + g(x + A 2 y), (18.22) 

where / and g are arbitrary functions. 

Finally, we note that the alternative solution d 2 f(p)/dp 2 = 0 to (18.21) leads 
only to the trivial solution u(x,y) = kx + ly + m, for which all second derivatives 
are individually zero. 



This equation is (18.20) with A = 1, B = 0 and C = — 1/c 2 , and so the values of A| and A 2 
are the solutions of 



namely Ai = —c and k 2 = c. This means that arbitrary functions of the quantities 

pi = x — ct, p 2 = x + ct 
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will be satisfactory solutions of the equation and that the general solution will be 

u(x, t ) = f(x — ct) + g(x + ct), (18.23) 

where / and g are arbitrary functions. This solution is discussed further in section 18.4. ◄ 

The method used to obtain the general solution of the wave equation may also 
be applied straightforwardly to Laplace’s equation. 


► Find the general solution of the two-dimensional Laplace equation 

8 2 u 8 2 u 

8x 2 + 8f = °' 


(18.24) 


Following the established procedure, we look for a solution that is a function f(p) of 
p = x + Ay, where from (18.24) A satisfies 

1 + A 2 = 0. 

This requires that A = +i, and satisfactory variables p are p = x + iy. The general solution 
required is therefore, in terms of arbitrary functions / and g, 

u{x,y) = f(x + iy) + g(x - iy). ◄ 


It will be apparent from the last two examples that the nature of the appropriate 
linear combination of x and y depends upon whether B 2 > 4AC or B 2 < 4 AC. 
This is exactly the same criterion as determines whether the PDE is hyperbolic 
or elliptic. Hence as a general result, hyperbolic and elliptic equations of the 
form (18.20), given the restriction that the constants A, B and C are real, have as 
solutions functions whose arguments have the form x+ay and x+ify respectively, 
where a and p themselves are real. 

The one case not covered by this result is that in which B 2 = 4 AC, i.e. a 
parabolic equation. In this case A\ and Aj are not different and only one suitable 
combination of .x and y results, namely 

u(x,y) = f(x- (B/2C)y). 

To find the second part of the general solution we try, in analogy with the 
corresponding situation for ordinary differential equations, a solution of the form 


u(x,y) = h(x,y)g(x - )B/2C)y). 


Substituting this into (18.20) and using A = B 2 /4C results in 


' dfh d 2 h 
dx 2 8x8y 



g = 0. 


Therefore we require h(x,y ) to be any solution of the original PDE. There are 
several simple solutions of this equation, but as only one is required we take the 
simplest non-trivial one, h(x,y ) = x, to give the general solution of the parabolic 
equation 


u(x,y) = f(x — (B/2C)y) + xg(x — (B/2C)y). (18.25) 
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We could, of course, have taken h(x,y ) = y, but this only leads to a solution that 
is already contained in (18.25). 


►So/re 

8 2 u ^ 8 2 u d 2 u 

dx 2 + "dxdy + dy 2 = 

subject to the boundary conditions u(0,y) = 0 and u(x, 1) = x 2 . 

From our general result, functions of p = x + Xy will be solutions provided 

1 + 2/1 + A 2 = 0, 

i.e. X = — 1 and the equation is parabolic. The general solution is therefore 

u(x,y) = f(x - y) + xg(x - y). 

The boundary condition u(0,y) = 0 implies f(p) = 0, whilst u(x, 1) = x 2 yields 

xg(x — 1) = x 2 , 

which gives g(p) = p + 1, Therefore the particular solution required is 
u(x, y) = x(p + 1) = x(x — y + 1). ◄ 


To reinforce the material discussed above we will now give alternative deriva- 
tions of the general solutions (18.22) and (18.25) by expressing the original PDE 
in terms of new variables before solving it. The actual solution will then become 
almost trivial; but, of course, it will be recognised that suitable new variables 
could hardly have been guessed if it were not for the work already done. This 
does not detract from the validity of the derivation to be described, only from 
the likelihood that it would be discovered by inspection. 

We start again with (18.20) and change to new variables 

C = -x + Xi y, i] = x + X 2 y. 


With this change of variables, we have from the chain rule that 

d_ _ _5_ d_ 
dx SC dr] ’ 

dy 1 d( 2 dr] 

Using these and the fact that 

A + BXi + C/ 1? = 0 for i = 1, 2, 
equation (18.20) becomes 


[2^4 + B(X i + A 2 ) + 2C1iA 2 ] 


d 2 u 

dCdi] 


= 0 . 
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Then, providing the factor in brackets does not vanish, for which the required 
condition is easily shown to be B 2 4AC, we obtain 


d 2 u 
8Cd t] 


= 0 , 


which has the successive integrals 
8u 


8r] 


= F(rj), u(C,rj) = f(r\) + g(£). 


This solution is just the same as (18.22), 

u(x,y) = f(x + k 2 y)+g(x + k 1 y). 

If the equation is parabolic (i.e. B 2 = 44C), we instead use the new variables 

C = X + ky, >1 = x, 

and recalling that k = —{B/2C) we can reduce (18.20) to 

d 2 u 

A ^f- = ' 

Two straightforward integrations give as the general solution 


«(£,'?) = rjg(C) + f( 0, 

which in terms of x and y has exactly the form of (18.25), 
m(x, y) = xg(x + ky) + f (x + ky). 

Finally, as hinted at in subsection 18.3.2 with reference to first-order linear 
PDEs, some of the methods used to find particular integrals of linear ODEs 
can be suitably modified to find particular integrals of PDEs of higher order. In 
simple cases, however, an appropriate solution may often be found by inspection. 


>-Find the general solution of 


d 2 u 8 2 u 

8^ + 8f 


6(x + y). 


Following our previous methods and results, the complementary function is 

u(x,y) = f(x + iy) + g{x - iy), 

and only a particular integral remains to be found. By inspection a particular integral of 
the equation is u(x,y) = x 3 +y 3 , and so the general solution can be written 

u(x,y) = f(x + iy) + g(x — iy) + x 3 + y 3 . ◄ 
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18.4 The wave equation 

We have already found that the general solution of the one-dimensional wave 
equation is 

u(x, t) = f(x — ct) + g(x + cf), (18.26) 

where / and g are arbitrary functions. However, the equation is of such general 
importance that further discussion will not be out of place. 

Let us imagine that u(x, t) = /(x — cf ) represents the displacement of a string at 
time f and position x. It is clear that all positions x and times f for which x — cf = 
constant will have the same instantaneous displacement. But x — cf = constant 
is exactly the relation between the time and position of an observer travelling 
with speed c along the positive x-direction. Consequently this moving observer 
sees a constant displacement of the string, whereas to a stationary observer, the 
initial profile n(x, 0) moves with speed c along the x-axis as if it were a rigid 
system. Thus f(x — ct) represents a wave form of constant shape travelling along 
the positive x-axis with speed c, the actual form of the wave depending upon 
the function /. Similarly, the term g(x + ct) is a constant wave form travelling 
with speed c in the negative x-direction. The general solution (18.23) represents 
a superposition of these. 

If the functions / and g are the same then the complete solution (18.23) 
represents identical progressive waves going in opposite directions. This may 
result in a wave pattern whose profile does not progress, described as a standing 
wave. As a simple example, suppose both f(p) and g(p) have the form! 

f(P ) = g(P ) = Acos{kp + e). 

Then (18.23) can be written as 

u(x, t) = H[cos(fcx — kct + e) + cos (kx + kct + e)] 

= 2 A cos(fccf) cos (kx + e). 

The important thing to notice is that the shape of the wave pattern, given by the 
factor in x, is the same at all times but that its amplitude 2Acos(kct) depends 
upon time. At some points x that satisfy 

cos (kx + e) = 0 

there is no displacement at any time; such points are called nodes. 

So far we have not imposed any boundary conditions on the solution (18.26). 
The problem of finding a solution to the wave equation that satisfies given bound- 
ary conditions is normally treated using the method of separation of variables 


1 In the usual notation, k is the wave number (= In/ wavelength) and kc = m, the angular frequency 
of the wave. 
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discussed in the next chapter. Nevertheless, we now consider D’Alembert’s solution 
u(x,t) of the wave equation subject to initial conditions (boundary conditions) in 
the following general form: 


initial displacement, m ( x , 0 ) = cf>(x ); 


initial velocity, 


8u(x, 0) 
dt 


= y(x). 


The functions < p(x ) and t p(x) are given and describe the displacement and velocity 
of each part of the string at the (arbitrary) time t = 0. 

It is clear that what we need are the particular forms of the functions / and g 
in (18.26) that lead to the required values at t = 0. This means that 


fix) = u(x, 0) = fix - 0) + g(x + 0), (18.27) 

= — = —cf'(x — 0) + cg'(x + 0), (18.28) 

dt 

where it should be noted that fix — 0) stands for df(p)/dp evaluated, after the 
differentiation, at p = x — c x 0; likewise for g'{x + 0). 

Looking on the above two left-hand sides as functions of p = x + ct, but 
everywhere evaluated at f = 0, we may integrate (18.28) between an arbitrary 
(and irrelevant) lower limit po and an indefinite upper limit p to obtain 


1 

c 


ip{q)dq + K = -f(p) + g{p). 


>P 0 


the constant of integration K depending on po- Comparing this equation with 
(18.27), with X replaced by p, we can establish the forms of the functions / and 
g as 


f(p) = 4>( f ~ ] f P v(q)dq - K , (18.29) 

2 2c Jpo 2 

g(p) = + ^, J w(q) dq + y. (18.30) 


Adding (18.29) with p = x — ct to (18.30) with p = a + ct gives as the solution to 
the original problem 


1 1 /* x + cf 
u(x, t) = - [cj)(x - ct) + (fix + ct)] + — J yiq)dq, (18.31) 

in which we notice that all dependence on p 0 has disappeared. 

Each of the terms in (18.31) has a fairly straightforward physical interpretation. 
In each case the factor 1 /2 represents the fact that only half a displacement profile 
that starts at any particular point on the string travels towards any other position 
a, the other half travelling away from it. The first term ^(a — ct) arises from 
the initial displacement at a distance ct to the left of a; this travels forward 
arriving at a at time t. Similarly, the second contribution is due to the initial 
displacement at a distance ct to the right of x. The interpretation of the final 
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term is a little less obvious. It can be viewed as representing the accumulated 
transverse displacement at position x due to the passage past x of all parts of 
the initial motion whose effects can reach x within a time f, both backward and 
forward travelling. 

The extension to the three-dimensional wave equation of solutions of the type 
we have so far encountered presents no serious difficulty. In Cartesian coordinates 
the three-dimensional wave equation is 


d 2 u d 2 u 8 2 u 1 d 2 u 
dx 2 8y 2 8z 2 c 2 8t 2 


(18.32) 


In close analogy with the one-dimensional case we try solutions that are functions 
of linear combinations of all four variables, 


p = lx + my + nz + pt. 

It is clear that a solution u(x,y,z,t) = f(p) will be acceptable provided that 

,2 , 2,2 d 2 f(p) 


l + m + n 


dp 2 


= 0 . 


Thus, as in the one-dimensional case, / can be arbitrary provided that 

l 2 + m 2 + n 2 = p 2 /c 2 . 

Using an obvious normalisation, we take p = +c and 1, m, n as three numbers 
such that 

/ 2 + m 2 + n 2 = 1 . 


In other words (l,m,ri) are the Cartesian components of a unit vector n that 
points along the direction of propagation of the wave. The quantity p can be 
written in terms of vectors as the scalar expression p = n • r + ct, and the general 
solution of (18.32) is then 

u(x,y,z,t) = u( r, t) = /(n • r — ct) + g(n • r + ct), (18.33) 

where n is any unit vector. It would perhaps be more transparent to write n 
explicitly as one of the arguments of u. 


18.5 The diffusion equation 


One important class of second-order PDEs, which we have not yet considered 
in detail, is that in which the second derivative with respect to one variable 
appears, but only the first derivative with respect to another (usually time). This 
is exemplified by the one-dimensional diffusion equation 


8 2 u(x, t) 8u 

K 8x 2 8t’ 


(18.34) 
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in which k is a constant with the dimensions length 2 x time -1 . The physical 
constants that go to make up k in a particular case depend upon the nature of 
the process (e.g. solute diffusion, heat flow, etc.) and the material being described. 

With (18.34) we cannot hope to repeat successfully the method of subsection 
18.3.3, since now u(x, t) is differentiated a different number of times on the two 
sides of the equation; any attempted solution in the form u(x,t) = f(p) with 
p — ax + bt will lead only to an equation in which the form of / cannot be 
cancelled out. Clearly we must try other methods. 

Solutions may be obtained by using the standard method of separation of 
variables discussed in the next chapter. Alternatively, a simple solution is also 
given if both sides of (18.34), as it stands, are separately set equal to a constant 
a (say), so that 

8 2 u a 8u 

dx 2 k 8t 

These equations have the general solutions 

a 2 

u(x, t) = — x + xg(f) + h(t) and u(x, t) = at + m(x) 

2 k 

respectively, and may be made compatible with each other if g(t) is taken as 
constant, g(f) = g (where g could be zero), h(t) = a t and m(x) = (a/2 k)x 2 + gx. 
An acceptable solution is thus 

u(x, t) = V—x 2 + gx + at + constant. (18.35) 

2k 

Let us now return to seeking solutions of equations by combining the inde- 
pendent variables in particular ways. Having seen that a linear combination of 
x and t will be of no value, we must search for other possible combinations. It 
has been noted already that k has the dimensions length 2 x time -1 and so the 
combination of variables 



will be dimensionless. Let us see if we can satisfy (18.34) with a solution of the 
form u(x, t) = /(;;). Evaluating the necessary derivatives we have 

8u df(p)8ri 2 xdf(r\) 

8x dp 8x Kt dtj 

S 2 u = 2 df(tj) + / 2x\ 2 d 2 f(i]) 

8x 2 Kt drj \ Kt J dp 2 

8u x 2 df(p) 

8t Kt 2 dp 

Substituting these expressions into (18.34) we find that the new equation can be 
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written entirely in terms of >7, 


4)7 


d 2 f( >1) 


+ (2 + n ) 


df {>1 ) 


= 0. 


dr, 2 ' ' " dr] 

This is a straightforward ODE, which can be solved (using a minimum of 
explanation) as follows. Writing /'()/) = df(t,)/di 7, etc., we have 


/"(>?) _ 1 1 

/'Of) 2)/ 4 

=> ln[f/ 1/2 /'('/)] = +c 

=* /'^ = ^2 eX P(x) 

=> f(ri)=A f /r 1/2 exp(-^) d,i. 

J11 0 v ^ ' 

If we now write this in terms of a slightly different variable 


s 2 2(K-f) 1 /2’ 

then d’C = dr,, and the solution to (18.34) is given by 

u(x,t) =f(r,) = g(C) = B [ exp (~v 2 )dv. (18.36) 

J Co 

Here B is a constant and it should be noticed that x and t appear on the RHS 
only in the indefinite upper limit C, and then only in the combination xf -1 / 2 . If 
Co is chosen as zero then u(x, t ) is, to within a constant factor,)' the error function 
erflx/^xf) 1 / 2 ], which is tabulated in many reference books. Only non-negative 
values of x and t are to be considered here, so that C > Co- 

Let us try to determine what kind of (say) temperature distribution and flow 
this represents. For definiteness we take Co = 0. Firstly, since u(x,t) in (18.36) 
depends only upon the product xU 1/2 , it is clear that all points x at times f such 
that xt _1,/2 has the same value have the same temperature. Put another way, at 
any specific time t the region having a particular temperature has moved along 
the positive x-axis a distance proportional to the square root of t. This is a typical 
diffusion process. 

Notice that, on the one hand, at t = 0, the variable C — > °o and u becomes 
quite independent of x (except perhaps at x = 0); the solution then represents a 
uniform spatial temperature distribution. On the other hand, at x = 0, m(x, t) is 
identically zero for all f. 


f Take B = 2n L 2 to give the usual error function normalised such that erf(oo) = 1. See the 
Appendix. 
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►/In infrared laser delivers a pulse of (heat) energy E to a point P on a large insulated 
sheet of thickness b, thermal conductivity k, specific heat s and density p. The sheet is 
initially at a uniform temperature. If u(r,t) is the excess temperature a time t later, at a 
point that is a distance r (>■ b) from P, then show that a suitable expression for u is 

«M)=“exp(-^), (18.37) 

where a and p are constants. (Note that we use r instead of p to denote the radial coordinate 
in plane polars so as to avoid confusion with the density.) 

Further, (i) show that ft = 2 k/(sp); (ii) show that the excess heat energy in the sheet 
is independent of t, and hence evaluate a; and (Hi) show that the total heat flow past any 
circle of radius r is E. 


The equation to be solved is the heat diffusion equation 


, „ 8u(r,t) 

kV 2 u(r,t) = sp — — . 

Since we only require the solution for r>i we can treat the problem as two-dimensional 
with obvious circular symmetry. Thus only the r-derivative term in the expression for V 2 u 
is non-zero, giving 


k d ( 8u\ 8u 
~rfr Vft 7 ) = SP ~8i' 

where now u(r,t) = u(r,t). 

(i) Substituting the given expression (18.37) into (18.38) we obtain 


(18.38) 


2 ka f r 2 


pt 2 \2pt 


2pt 


spa ( r 2 


TPN ( 5777 — 1 ) ex P ( = nH TU7 - 1 ex P 


f- \2pt 


2pt 


from which we find that (18.37) is a solution, provided /? = 2k /(sp). 
(ii) The excess heat in the system at any time t is 


r 

- exp 
t 


r 2 \ 


2 ptj 


dr 


bps / u(r, t)2nr dr = 2nbpsa 
Jo Jo 

= 2nbpsafl. 

The excess heat is therefore independent of t and must be equal to the total heat 
input E , implying that 

E E 

2nbpsP 4nbk 

(iii) The total heat flow past a circle of radius r is 

du(rj) rcc v / - 2 


-2nrbk 


[ 


8r 


-dt = —2nrbk 


= E 


exp 


P* E f-r 
o 4nbkt \ fit 
r 2 \ 1C ° 


r 

2ft 


2 fit) 


exp 

= E for all r. 


dt 


As we would expect, all the heat energy E deposited by the laser will eventually flow past 
a circle of any given radius r. ◄ 
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18.6 Characteristics and the existence of solutions 

So far in this chapter we have discussed how to find general solutions to various 
types of first- and second-order linear PDE. Moreover, given a set of boundary 
conditions we have shown how to find the particular solution (or class of solutions) 
that satisfies them. For first-order equations, for example, we found that if the 
value of u(x,y) is specified along some curve in the xy-plane then the solution to 
the PDE is in general unique, but that if u(x, y) is specified at only a single point 
then the solution is not unique: there exists a class of particular solutions all of 
which satisfy the boundary condition. In this section and the next we make more 
rigorous the notion of the types of boundary condition that cause a PDE to have 
a unique solution, a class of solutions, or no solution at all. 


18.6.1 First-order equations 


Let us consider the general first-order PDE (18.9) but now write it as 

A(x,y)^ + B(x,y)^~ = F(.x,y,M). (18.39) 

ox dy 

Suppose we wish to solve this PDE subject to the boundary condition that 
n(x,y) = (f>(s) is specified along some curve C in the xy-plane that is described 
parametrically by the equations x = x(s) and y = y(s), where s is the arc length 
along C. The variation of u along C is therefore given by 


du du dx du dy dcf> 
ds dx ds dy ds ds 


(18.40) 


We may then solve the two (inhomogeneous) simultaneous linear equations 
(18.39) and (18.40) for du/dx and du/dy , unless the determinant of the coefficients 
vanishes (see section 8.18), i.e. unless 


dx/ds dy/ds 

A B 


= 0 . 


At each point in the xy-plane this equation determines a set of curves called 
characteristic curves (or just characteristics ), which thus satisfy 

„ dx „ dy 
B Ts~ A Ts = ’ 

or, multiplying through by ds/ dx and dividing through by A, 


dy B(x,y ) 
dx A(x,y)' 


(18.41) 


However, we have, already met (18.41) in subsection 18.3.1 on first-order PDEs, 
where solutions of the form u(x, y) = f (p), where p is some combination of x and y, 
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were discussed. Comparing (18.41) with (18.12) we see that the characteristics are 
merely those curves along which p is constant. 

Since the partial derivatives du/dx and du/dy may be evaluated provided the 
boundary curve C does not lie along a characteristic, defining u(x,y) = tf)(s) 
along C is sufficient to specify the solution to the original problem (equation 
plus boundary conditions) near the curve C, in terms of a Taylor expansion 
about C. Therefore the characteristics can be considered as the curves along 
which information about the solution u(x, y ) ‘propagates’. This is best understood 
by using an example. 


►Find the general solution of 



, du 

2 y~F~ - 0 

8y 


that takes the value 2y + 1 on the line x = 1 between y 


0 and y = 1. 


(18.42) 


We solved this problem in subsection 18.3.1 for the case where u(x,y) takes the value 
2 y + 1 along the entire line x = 1. We found then that the general solution to the equation 
(ignoring boundary conditions) is of the form 

u(x,y) = f {p) = f(x 2 y), 

for some arbitrary function /. Hence the characteristics of (18.42) are given by x 2 y = c 
where c is a constant; some of these curves are plotted in figure 18.2 for various values of 
c. Furthermore, we found that the particular solution for which u(l,y) = 2y + 1 for all y 
was given by 

u(x,y) = 2 x 2 y + 1. 

In the present case the value of x 2 y is fixed by the boundary conditions only between 
y = 0 and y = 1. However, since the characteristics are curves along which x 2 y, and hence 
f(x 2 y), remains constant, the solution is determined everywhere along any characteristic 
that intersects the line segment denoting the boundary conditions. Thus u(x,y) = 2x 2 y + 1 
is the particular solution that holds in the shaded region in figure 18.2 (corresponding to 
0 < c < 1). 

Outside this region, however, the solution is not precisely specified, and any function of 
the form 

u(x,y) = 2 x 2 y + 1 + g(x 2 y) 

will satisfy both the equation and the boundary condition, provided g(p) = 0 for 

0 < p < 1. ◄ 


In the above example the boundary curve was not itself a characteristic and 
furthermore it crossed each characteristic once only. For a general boundary curve 
C this may not be the case. Firstly, if C is itself a characteristic (or is just a single 
point) then information about the solution cannot ‘propagate’ away from C, and 
so the solution remains unspecified everywhere except on C. 

The second possibility is that C (although not a characteristic itself) crosses 
some characteristics more than once, as in figure 18.3. In this case specifying the 
value of u{x,y) along the curve PQ determines the solution along all the character- 
istics that intersect it. Therefore, also specifying u(x, y) along QR can overdetermine 
the problem solution and generally results in there being no solution. 
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Figure 18.2 The characteristics of equation (18.42). The shaded region shows 
where the solution to the equation is defined, given the imposed boundary 
condition at x = 1 between y = 0 and y = 1, shown as a bold vertical line. 



x 

Figure 18.3 A boundary curve C that crosses characteristics more than once. 


18.6.2 Second-order equations 

The concept of characteristics can be extended naturally to second- (and higher-) 
order equations. In this case let us write the general second-order linear PDE 
(18.19) as 


.. s 8 2 u . d 2 u 


CU.v)|j=F 


x, y, u, 


8u d u 
8x’ 8y 


(18.43) 
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Figure 18.4 A boundary curve C and its tangent and unit normal at a given 
point. 


For second-order equations we might expect that relevant boundary conditions 
would involve specifying u, or some of its first derivatives, or both, along a 
suitable set of boundaries bordering or enclosing the region over which a solution 
is sought. Three common types of boundary condition occur and are associated 
with the names of Dirichlet, Neumann and Cauchy. They are as follows. 


(i) Dirichlet: The value of u is specified at each point of the boundary. 

(ii) Neumann : The value of du/dn, the normal derivative of u, is specified at 
each point of the boundary. Note that du/dn = Vm • n, where n is the 
normal to the boundary at each point. 

(iii) Cauchy: Both u and du/dn are specified at each point of the boundary. 


Let us consider for the moment the solution of (18.43) subject to the Cauchy 
boundary conditions, i.e. u and du/dn are specified along some boundary curve 
C in the xy-plane defined by the parametric equations x = x(s), y = _v(s), s being 
the arc length along C (see figure 18.4). Let us suppose that along C we have 
u(x,y) = cf>(s) and du/dn = xp(s). At any point on C the vector dr = dx i + dy j is 
a tangent to the curve and hds = dy i — dx j is a vector normal to the curve. Thus 
on C we have 

dcj>(s) 
ds 

ip(s). 

These two equations may then be solved straightforwardly for the first partial 
derivatives du/dx and du/dy along C. Using the chain rule to write 


du _ dr du dx du dy 

ds ds dx ds dy ds 

du „ 

— = Vu • n 


dn 


du dy 
dx ds 


du dx 
dy ds 


d_ 

ds 


dx d 
ds dx 


dy d 
ds dy’ 
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we may differentiate the two first derivatives du/dx and du/dy along the boundary 
to obtain the pair of equations 


d i 

( dll \ 

dx 8 2 u 

dy 8 2 u 

ds ' 

V dx ) 

ds dx 2 

ds dxdy' 

d , 

( ^m\ 

dx d 2 u 

dy d 2 u 

ds 

w 

ds dxdy 

ds dy 2 ' 


We may now solve these two equations, together with the original PDE (18.43), 
for the second partial derivatives of u, except where the determinant of their 
coefficients equals zero, 


ABC 
dx dy 
ds ds 

dx dy 
ds ds 


= 0 . 


Expanding out the determinant, 


A 





+ C 



= 0 . 


Multiplying through by (ds/dx) 2 we obtain 


A 




+ C = 0, 


(18.44) 


which is the ODE for the curves in the xy -plane along which the second partial 
derivatives of u cannot be found. 

As for the first-order case, the curves satisfying (18.44) are called characteristics 
of the original PDE. These characteristics have tangents at each point given by 
(when A ^ 0) 


iy _ B±JW^4AC 
dx 2 A 

Clearly, when the original PDE is hyperbolic ( B 2 > 4 AC), equation (18.45) 
defines two families of real curves in the xy-plane; when the equation is parabolic 
( B 2 = 4 AC) it defines one family of real curves; and when the equation is elliptic 
(B 2 < 4/1 C ) it defines two families of complex curves. Furthermore, when A, 
B and C are constants, rather than functions of x and y, the equations of the 
characteristics will be of the form x + Ay = constant, which is reminiscent of the 
form of solution discussed in subsection 18.3.3. 
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Figure 18.5 The characteristics for the one-dimensional wave equation. The 
shaded region indicates the region over which the solution is determined by 
specifying Cauchy boundary conditions at t = 0 on the line segment x = 0 to 
x = L. 


>-Fincl the characteristics of the one-dimensional wave equation 

8 2 u 1 d 2 u 

dx 2 “ =0 ' 


This is a hyperbolic equation with A = 1, B = 0 and C = — 1/c 2 . Therefore from (18.44) 
the characteristics are given by 



and so the characteristics are the straight lines x — ct = constant and x + ct = constant. ◄ 

The characteristics of second-order PDEs can be considered as the curves along 
which partial information about the solution u(x,y) ‘propagates’. Consider a point 
in the space that has the independent variables as its coordinates; if either or 
both of the two characteristics which pass through the point does not intersect 
the curve along which the boudary conditions are specified then the solution will 
not be determined at that point. In particular, if the equation is hyperbolic, so 
that we obtain two families of real characteristics in the xy-plane, then Cauchy 
boundary conditions propagate partial information concerning the solution along 
the characteristics, belonging to each family, that intersect the boundary curve C. 
The solution u is then specified in the region common to these two families of 
characteristics. For instance, the characteristics of the hyperbolic one-dimensional 
wave equation in the last example are shown in figure 18.5. By specifying Cauchy 
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Equation type 

Boundary 

Conditions 

hyperbolic 

open 

Cauchy 

parabolic 

open 

Dirichlet or Neumann 

elliptic 

closed 

Dirichlet or Neumann 


Table 18.1 The appropriate boundary conditions for different types of partial 
differential equation. 


boundary conditions u and du/dt on the line segment t = 0, x = 0 to L, the 
solution is specified in the shaded region. 

As in the case of first-order PDEs, however, problems can arise. For example, 
if for a hyperbolic equation the boundary curve intersects any characteristic 
more than once then Cauchy conditions along C can overdetermine the problem, 
resulting in there being no solution. In this case either the boundary curve C 
must be altered, or the boundary conditions on the offending parts of C must be 
relaxed to Dirichlet or Neumann conditions. 

The general considerations involved in deciding which boundary conditions are 
appropriate for a particular problem are complex, and we do not discuss them 
any further here.t We merely note that whether the various types of boundary 
condition are appropriate (in that they give a solution that is unique, sometimes 
to within a constant, and is well defined) depends upon the type of second-order 
equation under consideration and on whether the region of solution is bounded 
by a closed or an open curve (or a surface if there are more than two independent 
variables). Note that part of a closed boundary may be at infinity if conditions 
are imposed on u or du/dn there. 

It may be shown that the appropriate boundary-condition and equation-type 
pairings are as given in table 18.1. 

For example, Faplace’s equation V 2 u = 0 is elliptic and thus requires either 
Dirichlet or Neumann boundary conditions on a closed boundary which, as we 
have already noted, may be at infinity if the behaviour of u is specified there 
(most often u or du/dn — * 0 at infinity). 


18.7 Uniqueness of solutions 

Although we have merely stated the appropriate boundary types and conditions 
for which, in the general case, a PDE has a unique, well-defined solution, some- 
times to within an additive constant, it is often important to be able to prove 
that a unique solution is obtained. 


f For a discussion the reader is referred, for example, to Morse and Feshbach, Methods of Theoretical 
Physics, Part I (McGraw-Flill, 1953) chapter 6. 
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As an extremely important example let us consider Poisson’s equation in three 
dimensions, 

V 2 n(r) = p( r), (18.46) 

with either Dirichlet or Neumann conditions on a closed boundary appropriate 
to such an elliptic equation; for brevity, in (18.46), we have absorbed any physical 
constants into p. We aim to show that, to within an unimportant constant, the 
solution of (18.46) is unique if either the potential u or its normal derivative 
du/dn is specified on all surfaces bounding a given region of space (including, if 
necessary, a hypothetical spherical surface of indefinitely large radius on which u 
or du/dn is prescribed to have an arbitrarily small value). Stated more formally 
this is as follows. 

Uniqueness theorem. If u is real and its first and second partial derivatives are 
continuous in a region V and on its boundary S, and V 2 m = p in V and either 
u = f or du/dn = g on S, where p, f and g are prescribed functions, then u is 
unique (at least to within an additive constant ). 


>-Prove the uniqueness theorem for Poisson’s equation. 


Let us suppose on the contrary that two solutions iq(r) and u 2 ( r) both satisfy the conditions 
given above, and denote their difference by the function w = u i — u 2 . We then have 

V 2 w = V 2 iq — V 2 u 2 = P — P = 0, 


so that w satisfies Laplace’s equation in V. Furthermore, since either ui = f = u 2 or 
dui/dn = g = 8u 2 /8n on S, we must have either w = 0 or 8w/8n = 0 on S. 

If we now use Green’s first theorem, (11.19), for the case where both scalar functions 
are taken as w we have 


f [wV 2 w + (Vw) • (Vw)] dV = f w C -^-dS. 
Jv Js on 


However, either condition, w = 0 or 8w/8n = 0, makes the RHS vanish whilst the first 
term on the LHS vanishes since V 2 w = 0 in V. Thus we are left with 

f |Vw| 2 dV = 0. 

Jv 

Since |Vw| 2 can never be negative, this can only be satisfied if 


Vw = 0 , 


i.e. if w, and hence u\ — u 2 , is a constant in V. 

If Dirichlet conditions are given then u\ = u 2 on (some part of) S and hence u\ = u 2 
everywhere in V. For Neumann conditions, however, u\ and u 2 can differ throughout V 
by an arbitrary (but unimportant) constant. ◄ 


The importance of this uniqueness theorem lies in the fact that if a solution to 
Poisson’s (or Laplace’s) equation that fits the given set of Dirichlet or Neumann 
conditions can be found by any means whatever, then that solution is the correct 
one, since only one exists. This result is the mathematical justification for the 
method of images, which is discussed more fully in the next chapter. 
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We also note that often the same general method, used in the above example 
for proving the uniqueness theorem for Poisson’s equation, can be employed to 
prove the uniqueness (or otherwise) of solutions to other equations and boundary 
conditions. 


18.1 


18.2 


18.3 


18.4 


18.5 


18.6 


18.8 Exercises 


Determine whether the following can be written as functions of p = x 2 + 2 y only, 
and hence whether they are solutions of (18.8): 

(a) x 2 (x 2 - 4) + 4 y(x 2 - 2) + 4 (y 2 - 1); 

(b) x 4 + 2x 2 y + y 2 ; 

(c) [x 4 + 4 x 2 y + 4 y 2 + 4] / [2.x 4 + x 2 (8y + 1) + 8y 2 + 2 y]. 

Find partial differential equations satisfied by the following functions «(.x,y) for 
all arbitrary functions / and all arbitrary constants a and b : 

(a) u(x,y) = f(x 2 — y 2 ); 

(b) u(x, v) = (x — a) 2 + (y — b) 2 ; 

(c) u(x,y) = y n f(y/x); 

(d) u(x,y ) = f(x + ay). 

Solve the following partial differential equations for i<(x,y) with the boundary 
conditions given: 

. , du 

(a) x- — b xy = u, 

8x 

.. , . du 

( b ) 1 + x — = xu, 

8y 

Find the most general solutions u(x,y) of the following equations consistent with 
the boundary conditions stated : 


u = 2 y on the line x = 1 ; 
u(x, 0) = x. 


(a) y 


du du 
dx~ X dy 


= 0 , 


w(x, 0) = 1 + sinx; 


(b) i 


du du 
dx dy ’ 


u = (4 + 3/)x 2 on the line x = y; 


. . du du 

(c) smxsiny- — f cosxcosy^ = 0, 

dx dy 


u = cos 2y on x + y 


n/2; 


(d) - — f 2x— =0, u = 2 on the parabola y = x . 

dx dy 

Find solutions of 

1 du 1 du 
x dx y dy 


for which (a) u(0,y) = y, (b) w(l, 1) = 1. 

Find the most general solutions u(x,y) of the following equations consistent with 
the boundary conditions stated : 


(a) y 


du du 

x — 

dx dy 


= 3x, 


u = x 2 on the line y = 0; 
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18.7 


18.8 


18.9 


18.10 


18.11 


18.12 


18.13 


18.14 


18.15 


du 8u 

(b) y- x— = 3x, u( 1,0) = 2; 

ox 8y 

■» 8u 2 8u 2 2 \ ^ 

(c) y - — lr — = x y (x + v ), no boundary conditions. 

8x dy 

Solve 

du du 

sm x- — |- cos x— = cos x 
8x dy 


subject to (a) u(n/2,y) = 0, (b) u[n/2,y) = y(y + 1). 
A function u(x,y) satisfies 


. du du 

2— + 3— = 10, 

dx dy 


and takes the value 3 on the line y = 4x. Evaluate w(2,4). 
If u(x,y) satisfies 


d 2 u d 2 u 

dx 2 dxdy 



= 0 


and u = —x 2 and du/dy = 0 for y = 0 and all x, find the value of u(0, 1). 

(a) Solve the previous question if the boundary condition is u = du/dy = 1 
when y = 0 for all x. 

(b) In which region of the xy-plane would u be determined if the boundary 
condition were u = du/dy = 1 when y = 0 for all x > 0? 

In those cases in which it is possible to do so, evaluate w( 2,2), where u(x,y) is 
the solution of 

- du du 2 2 

2y— - x— = 2xy(2y - x~) 
ox dy 


that satisfies the (separate) boundary conditions given below. 


(a) 

u(x, 1) = X 2 

for all x. 

(b) 

u(x, 1) = x 2 

for x > 0. 

(c) 

u(x, 1) = X 2 

for 0 < x < 3. 

(d) 

m(x, 0) = x 

for x > 0. 

(e) 

u(x, 0) = x 

for all x. 

(f) 

«(i,V 10 ) = 

5. 

(g) 

u(V 1 0,l) = 

5. 

Solve 



d 2 u d 2 u d 2 u 

+ d^ = l4, 

subject to u = 2x + 1 and du/dy = 4 — 6x, both on the line y = 0. 
By changing the independent variables in the previous question to 


d, = x + 2y and t / = x + 3y, 

show that it must be possible to write 14(x 2 + 5xy + 6 v 2 ) in the form 
/i(x + 2 y) + f 2 (x + 3 y) - (x 2 + y 2 ), 


and determine the forms of fi(z) and f 2 (z). 
Solve 


8 2 u | ^ d 2 u 

fadfy +3 df 


= x(2 y + 3x). 


Find the most general solution of 8 2 u/dx 2 + d 2 u/dy 2 = x 2 y 2 . 
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18.16 


An infinitely long string on which waves travel at speed c has an initial displace- 
ment 


y(x) 


sin(7ix/a), — a < x < a, 
0, |x| > a. 


18.17 


18.18 


18.19 


18.20 


It is released from rest at time f = 0, and its subsequent displacement is described 
by y(x, t). 

By expressing the initial displacement as one explicit function incorporating 
Heaviside step functions, find an expression for y(x, t) at a general time f > 0. In 
particular, determine the displacement as a function of time (a) at x = 0, (b) at 
x = a, and (c) at x = a/2. 

The non-relativistic Schrodinger equation (18.7) is similar to the diffusion equa- 
tion in having different orders of derivatives in its various terms; this precludes 
solutions that are arbitrary functions of particular linear combinations of vari- 
ables. However, since exponential functions do not change their forms under 
differentiation, solutions in the form of exponential functions of combinations of 
the variables may still be possible. 

Consider the Schrodinger equation for the case of a constant potential, i.e. for 
a free particle, and show that it has solutions of the form A exp(/.x + my + nz + At) 
where the only requirement is that 

— — (I 2 + m 2 + n 2 ) = ihl. 

2m 

In particular, identify the equation and wavefunction obtained by taking X as 
— iE/h , and /, m and n as ip x /h,ip y /h and ip-Jh respectively, where E is the 
energy and p the momentum of the particle; these identifications are essentially 
the content of the de Broglie and Einstein relationships. 

Like the Schrodinger equation of the previous question, the equation describing 
the transverse vibrations of a rod, 


, d 4 u 8 2 u 

a 1 

8x 4 8t 2 


= 0 , 


has different orders of derivatives in its various terms. Show, however, that it has 
solutions of exponential form u(x, t) = Aexp(Xx + imt) provided that the relation 
a 4 A 4 = co 2 is satisfied. 

Use a linear combination of such allowed solutions, expressed as the sum of 
sinusoids and hyperbolic sinusoids of Ax, to describe the transverse vibrations of 
a rod of length L clamped at both ends. At a clamped point both u and du/8x 
must vanish; show that this implies that cos(2L)cosh(2L) = 1, thus determining 
the frequencies co at which the rod can vibrate. 

An incompressible fluid of density p and negligible viscosity flows with velocity 
v along a thin straight tube, perfectly light and flexible, of cross-section A and 
held under tension T. Assume that small transverse displacements u of the tube 
are governed by 


8 2 u 8 2 u 

W + 2v d^d~t + 




(a) Show that the general solution consists of a superposition of two waveforms 
travelling with different speeds. 

(b) The tube initially has a small transverse displacement u = a cos kx and is 
suddenly released from rest. Find its subsequent motion. 

A sheet of material of thickness w, specific heat capacity c and thermal con- 
ductivity k is isolated in a vacuum, but its two sides are exposed to fluxes of 
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radiant heat of strengths J\ and Js. Ignoring short-term transients, show that the 
temperature difference between its two surfaces is steady at (J 2 — J\ )w/2k, whilst 
their average temperature increases at a rate ( J 2 + J\)/cw. 

18.21 In an electrical cable of resistance R and capacitance C per unit length, voltage 
signals obey the equation d 2 V/dx 2 = RCdV/dt. This has solutions of the form 
given in (18.36) and also of the form V = Ax + D. 

(a) Find a combination of these that represents the situation after a steady 
voltage V 0 is applied at x = 0 at time t = 0. 

(b) Obtain a solution describing the propagation of the voltage signal resulting 
from application of the signal V = V 0 for 0 < f < T, V = 0 otherwise, to 
the end x = 0 of an infinite cable. 

(c) Show that for t >• T the maximum signal occurs at a value of x proportional 
to t 1/2 and has a magnitude proportional to f -1 . 

18.22 The daily and annual variations of temperature at the surface of the earth may 
be represented by sine-wave oscillations with equal amplitudes and periods of 
1 day and 365 days respectively. Assume that for (angular) frequency co the 
temperature at depth x in the earth is given by u(x, t) = A sin(cof + fix) exp(— Ax), 
where X and fi are constants. 

(a) Use the diffusion equation to find the values of X and fi. 

(b) Find the ratio of the depths below the surface at which the amplitudes have 
dropped to 1 /20 of their surface values. 

(c) At what time of year is the soil coldest at the greater of these depths, 
assuming that the smoothed annual variation in temperature at the surface 
has a minimum on February 1st? 

18.23 Consider each of the following situations in a qualitative way and determine 
the equation type, the nature of the boundary curve and the type of boundary 
conditions involved. 


18.24 


18.25 


(a) a conducting bar given an initial temperature distribution and then thermally 
isolated; 

(b) two long conducting concentric cylinders on each of which the voltage 
distribution is specified; 

(c) two long conducting concentric cylinders on each of which the charge dis- 
tribution is specified; 

(d) a semi-infinite string the end of which is made to move in a prescribed way. 


This example gives a formal demonstration that the type of a second-order PDE 
( elliptic, parabolic or hyperbolic ) cannot be changed by a new choice of independent 
variable. The algebra is somewhat lengthy, but straightforward. 

If a change of variable c = £(x,y), 17 = tf(x,y) is made in (18.19), so that it 
reads 


,8 2 u , d 2 u 

A W + B Wn 


„,d 2 U ,1 !l I JU . „ 

+ C dtj 2 + D ** + E + F 11 = R ^ 11 




7 ,du 

dtf 


show that 


B' 2 — AA'C = (B 2 — 4AC) 


' d(i,r» -\ 2 
8(x,y) 


Hence deduce the conclusion stated above. 

The Klein-Gordon equation (which is satisfied by the quantum-mechanical wave- 
function <D(r) of a relativistic spinless particle of non-zero mass m) is 


V 2 ® - nr<t> = 0. 
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Show that the solution for the scalar field ®(r) in any volume V bounded by 
a surface S is unique if either Dirichlet or Neumann boundary conditions are 
specified on S. 


18.9 Hints and answers 

18.1 (a) Yes, p 2 — 4p — 4; (b) no, ( p — y) 2 ; (c) yes, (p 2 + 4)/(2p 2 + p). 

18.2 (a) y(8u/8x) + x(8u/8y) = 0; (b) ( du/dx ) 2 + ( du/dy ) 2 = 4 u; (c) x(8u/dx) + 

y(8u/8y) = nu ; (d) (8u/8y)(8 2 u/8x 2 ) = (8u/dx)(8 2 u/8x8y), or with x and y 
reversed. 

18.3 Each equation is effectively an ordinary differential equation but with a function 
of the non-integrated variable as the constant of integration; 

(a) u = xy( 2 — lnx); (b) u = x _1 (l — e v ) + xe v . 

18.4 (a) p = x 2 + y 2 , u = sin(x 2 + y 2 ) 1/2 + 1 ; (b) p = 3.x + iy, u = (3x + iy) 1/2 /2; 

(c) p = sin x cos y, u = 2 sin x cos y — 1; (d) p = y — x 2 , u = y — x 2 + 2. 

18.5 (a) (y 2 — x 2 ) 1/2 ; (b) 1 + f(y 2 — x 2 ) where /( 0) = 0. 

18.6 (a ) p = x 2 + y 2 , particular integral u = —3 y, u = x 2 + y 2 — 3y ; 

(b) u = x 2 + y 2 — 3y + 1 + g(x 2 + y 2 ) where g(l) = 0; 

(c) (x 6 + y 6 )/6 + g(x 3 - y 3 ). 

18.7 u = y + f(y — ln(sinx)); (a) u = ln(sinx); (b) u = y + [y — ln(sinx)] 2 . 

18.8 u = /( 3x — 2y) + 2(x + y); f(p) = 3 + 2 p; u = 8x — 2y + 3 and z/(2, 4) = 11. 

18.9 General solution is u(x,y) = f(x + y) + g(x + y/2). Show that 2 p = —g'(p)/ 2, 

and hence g(p) = k — 2 p 2 , whilst f(p) = p 2 — k , leading to u(x,y) = — x 2 + y 2 / 2; 
u(0, 1) = 1/2. 

18.10 (a) u(x,y) = 2(x + y) — 2(x + y/2) + 1 = y + 1; u( 0, 1) = 2; (b) in the sector 

— 7r / 4 < 6 < n/2 + 4>, where tan cf> = 1/2 and d is measured from the positive 

x-axis. 

18.11 p = x 2 + 2 y 2 ; u(x,y) = f(p) + x 2 y 2 / 2. 

(a) u(x,y) = (x 2 + 2y 2 + x 2 y 2 — 2)/2. u( 2,2) = 13. The line y = 1 cuts each 
characteristic in zero or two distinct points, but this causes no difficulty with 
the given boundary conditions. 

(b) As in (a). 

(c) The solution is defined over the space between the ellipses p = 2 and p = 11 ; 
(2,2) lies on p = 12, and so u( 2,2) is undetermined. 

(d) u(x,y) = (x 2 + 2y 2 ) 1/2 +x 2 y 2 / 2; «( 2,2) = 8 + ^12. 

(e) The line y = 0, cuts each characteristic in two distinct points. No differ- 
entiable form of f(p) gives f(+a) = +a respectively, and so there is no 
solution. 

(f) The solution is only specified on p = 21, and so u(2, 2) is undetermined. 

(g) The solution is specified on p = 12, and so u( 2,2) = 5 + |(4)(4) = 13. 

18.12 u(x,y) = f(x + 2 y) + g(x + 3y) + x 2 + y 2 , leading to u = 1 + 2x + 4y — 6xy — 8y 2 . 

18.13 The equation becomes 8 2 f /8^8r\ = —14, with solution /(£,>/) = /(c)+g(i/)— 14<;^, 
which can be compared with the answer from the previous question; / i(z) = lOz 2 
and fo(z) = 5 z 2 . 

18.14 u = f(y — 3x) + g(x) + x 2 y 2 / 2. 

18.15 u(x, y) = /(x + iy) + g(x— iy) + (l/12)x 4 (y 2 — (l/15)x 2 ). In the last term, x and y 
may be interchanged. 

18.16 1 

y(x, t) = - sin[7i(x — ct)/a] [H(x — ct + a) — H(x — ct — n)] 

+ ^ sin[7i(x + ct)/a][H(x + ct + a) — H(x + ct — a)]. 
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18.9 HINTS AND ANSWERS 


18.17 

18.18 

18.19 

18.20 
18.21 


18.22 

18.23 

18.24 

18.25 


(a) zero at all times; (b) \ &in(nct/a) for 0 < t < 2a/c, and 0 otherwise; 

(c) cos(nctfa) for 0 < f < a/2c, \cos(nct/a) for a/2c < t < 3a/2c, and 0 
otherwise. 

E = p 1 /(2m), the relationship between energy and momentum for a non- 
relativistic particle; u(r, t) = A exp[i(p ■ r — Et)/h\, a plane wave of wave number 
k = p/ft and angular frequency w = E/h travelling in the direction p/p. 

X = +w l/2 /a or +ia) 1/2 /a; u(x,t) = exp(icot)[A sinJx + B cos2x + Csinh2x + 
D cosh2x], with C = —A and D = —B. The conditions at x = L and consistency 
establish the quoted result. 

(a) c = v + a where a 2 = T / pA; 

(b) u(x, t ) = acos[k(x — vt )] cos(kaf) — (ua/a) sin[/c(x — vt )] sin (kat). 

Use the first form of solution given in (18.35). 

(a) Vo 1 — (2/ *Jn) f 2 xl - CR/n 1 exp(— v 2 )dv ; (b) consider as Vo applied at f = 0 
and continued and — F 0 at t = T and continued; 


V(x, t) = 


l n J IxfCR/t) 1 / 2 


(c) For f T, maximum at x = [2t/(CR)] 1/2 with value 


VqT exp(— 1/2) 
(271 )V 2 t 


(a) X = —fj. = [co/(2k)] 1/2 , where k is the diffusion constant; (b) x a = (365) 1/2 x d ; 
(c) only the annual variation is significant at this depth and has a phase p a x a = 
ln20 behind the surface. Thus the coldest day is 1 February + (365 ln 20)/(27r) 
days » 23 July. 

(a) Parabolic, open, Dirichlet u(x,0) given, Neumann du/dx = 0 at x = +L/2 
for all f; 

(b) elliptic, closed, Dirichlet; 

(c) elliptic, closed, Neumann du/dn = a/eo ; 

(d) hyperbolic, open, Cauchy. 


A' = A 


+ B^ + C 

ox oy 


. d£ drj ( d£ drj d£, dr] 
B =2A ^c^c +B \J^Ty + TyTx 


_l_ 2C — — 
+ Sy 8y’ 


Follow an argument similar to that in section 18.7 and argue that the additional 
term f m 2 \w\ 2 dV must be zero, and hence that w = 0 everywhere. 
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19 


Partial differential equations: 
separation of variables and other 

methods 


In the previous chapter we demonstrated the methods by which general solutions 
of some partial differential equations (PDEs) may be obtained in terms of 
arbitrary functions. In particular, solutions containing the independent variables 
in definite combinations were sought, thus reducing the effective number of them. 

In the present chapter we begin by taking the opposite approach, namely that 
of trying to keep the independent variables as separate as possible, using the 
method of separation of variables. We then consider integral transform methods 
by which one of the independent variables may be eliminated, at least from 
differential coefficients. Finally, we discuss the use of Green’s functions in solving 
inhomogeneous problems. 


19.1 Separation of variables: the general method 

Suppose we seek a solution u(x,y,z,t) to some PDE (expressed in Cartesian 
coordinates). Let us attempt to obtain one that has the product formf 

u(x,y,z,t) = X(x)Y ( y)Z(z)T(t ). (19.1) 

A solution that has this form is said to be separable in x, y, z and f, and seeking 
solutions of this form is called the method of separation of variables. 

As simple examples we may observe that, of the functions 

(i) xyz 2 sin bt, (ii) xy + zt, (iii) (x 2 + y 2 )z cos cot, 

(i) is completely separable, (ii) is inseparable in that no single variable can be 
separated out from it and written as a multiplicative factor, whilst (iii) is separable 
in z and t but not in x and y. 


f It should be noted that the conventional use here of upper-case (capital) letters to denote the 
functions of the corresponding lower-case variable is intended to enable an easy correspondence 
between a function and its argument to be made. 
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19.1 SEPARATION OF VARIABLES: THE GENERAL METHOD 


When seeking PDE solutions of the form (19.1), we are requiring not that there 
is no connection at all between the functions X, Y, Z and T (for example, certain 
parameters may appear in two or more of them), but only that the X does not 
depend upon y, z, t, that Y does not depend on x, z, f and so on. 

For a general PDE it is likely that a separable solution is impossible, but 
certainly some common and important equations do have useful solutions of 
this form and we will illustrate the method of solution by studying the three- 
dimensional wave equation 


V 2 m( r) 


1 d 2 u( r) 
c 2 dt 2 


(19.2) 


We will work in Cartesian coordinates for the present and assume a solution 
of the form (19.1); the solutions in alternative coordinate systems, e.g. spherical 
or cylindrical polars, are considered in section 19.3. Expressed in Cartesian 
coordinates (19.2) takes the form 


substituting (19.1) gives 


d 2 u 8 2 u 
8x 2 8y 2 + 


8 2 u 

8z 2 


1 chi _ 
c 2 dt 2 ’ 


(19.3) 


d 2 X d 2 Y d 2 Z 1 d 2 T 

YZT + X— r ZT +XY — T = — XYZ 


dx 2 dy 2 

which can also be written as 


dz 2 


dt 2 ’ 


1 


X"YZT +XY"ZT +XYZ"T = XYZT " 


(19.4) 


where in each case the primes refer to the ordinary derivative with respect to the 
independent variable upon which the function depends. This emphasises the fact 
that each of the functions X, Y, Z and T has only one independent variable and 
thus its only derivative is its total derivative. For the same reason, in each term 
in (19.4) three of the four functions are unaltered by the partial differentiation 
and behave exactly as constant multipliers. 

If we now divide (19.4) throughout by u = XY Z T we obtain 


X" 

X 


+ 


Y" 

X 



1 T" 


(19.5) 


This form shows the particular characteristic that is the basis of the method of 
separation of variables, namely that of the four terms the first is a function of x 
only, the second of y only, the third of z only and the RHS a function of t only 
and yet there is an equation connecting them. This can only be so for all x, y, z 
and t if each of the terms does not in fact, despite appearances, depend upon the 
corresponding independent variable but is equal to a constant , the four constants 
being such that (19.5) is satisfied. 

Since there is only one equation to be satisfied and four constants involved. 
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there is considerable freedom in the values they may take. For the purposes of 
our illustrative example let us make the choice of —l 2 , —m 2 , —n 2 , for the first 
three constants. The constant associated with c~ 2 T" /T must then have the value 
-pr = -(l 2 + m 2 + n 2 ). 

Having recognised that each term of (19.5) is individually equal to a constant 
(or parameter), we can now replace (19.5) by four separate ordinary differential 
equations (ODEs), 


X" 

X 


= -/ 2 
1 •> 




1 T" 

c^X 


= -/< 


(19.6) 


The important point to notice is not the simplicity of the equations (19.6) (the 
corresponding ones for a general PDE are usually far from simple) but that, by 
the device of assuming a separable solution, a partial differential equation (19.3), 
containing derivatives with respect to the four independent variables all in one 
equation, has been reduced to four separate ordinary differential equations (19.6). 
The ordinary equations are connected through four constant parameters that 
satisfy an algebraic relation. These constants are called separation constants. 

The general solutions of the equations (19.6) can be deduced straightforwardly 
and are 


X(x) = Aexp(ilx) + B exp (—ilx) 

Y(y) = C exp {imy) + D exp (-imy) ( 19 ^ 

Z(z) = E exp(inz) + F exp (—inz) 

T(t) — Gexp{icpt) + H exp(-icpt), 

where A, are constants, which may be determined if boundary condtions 

are imposed on the solution. Depending on the geometry of the problem and 
any boundary conditions, it is sometimes more appropriate to write the solutions 
(19.7) in the alternative form 


X(x) = A' cos lx + B ’ sin lx 
Y (y) = C' cos my + D' sin my 
Z(z) = E' cos nz + F' sin nz 
T(t) = G' cos(c/n) + H' sin (cpt), 


(19.8) 


for some different set of constants A',B', Clearly the choice of how best 
to represent the solution depends on the problem being considered. 

As an example, suppose that we take as particular solutions the four functions 


X(x) = exp (ilx), 
Z(z) = exp(inz), 


Y (y) = exp(imy), 
T(t) = exp (—icpt). 
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This gives a particular solution of the original PDE (19.3) 

u(x,y,z,t) = exp(;7.x) exp(imy) exp(inz) exp(— icfit) 

= exp[i(l.x + my + nz — c/^it)], 

which is a special case of the solution (18.33) obtained in the previous chapter 
and represents a plane wave of unit amplitude propagating in a direction given 
by the vector with components l,m,n in a Cartesian coordinate system. In the 
conventional notation of wave theory, /, m and n are the components of the 
wave-number vector k, whose magnitude is given by k = 2n/X, where X is the 
wavelength of the wave; cfi is the angular frequency co of the wave. This gives 
the equation in the form 

u(x, y, z, t ) = exp [i(k x x + k y y + k : z — cot)] 

= exp[;'(k • r — cut)], 

and makes the exponent dimensionless. 

The method of separation of variables can be applied to many commonly 
occurring PDEs encountered in physical applications. 


► C/se the method of separation of variables to obtain for the one-dimensional diffusion 
equation 


d 2 u 8u 
K dx 2 8t ’ 

a solution that tends to zero as t — > oo for all x. 


(19.9) 


Here we have only two independent variables .x and f and we therefore assume a solution 
of the form 

u(x, t) = X(x)T(t). 

Substituting this expression into (19.9) and dividing through by u = XT (and also by k) 
we obtain 

X" _ T' 

~X ~ kT' 

Now, arguing exactly as above that the LHS is a function of x only and the RHS is a 
function of f only, we conclude that each side must equal a constant, which, anticipating 
the result and noting the imposed boundary condition, we will take as —X 2 . This gives us 
two ordinary equations, 

X" + X 2 X = 0, (19.10) 

T' + X 2 kT = 0, (19.11) 

which have the solutions 

X(x) = A cos Xx + B sin Xx, 

T(t) = C exp(—X 2 Kt). 

Combining these to give the assumed solution u = XT yields (absorbing the constant C 
into A and B ) 

u(x,t) = (Tcos Xx + B sin/Lx)exp(— X 2 Kt). (19.12) 
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In order to satisfy the boundary condition u — > 0 as f — » oo, A 2 jc must be > 0. Since k 
is real and > 0, this implies that A is a real non-zero number and that the solution is 
sinusoidal in x and is not a disguised hyperbolic function; this was our reason for choosing 
the separation constant as —A 2 . ◄ 

As a final example we consider Laplace’s equation in Cartesian coordinates; 
this may be treated in a similar manner. 


► Use the method of separation of variables to obtain a solution for the two-dimensional 
Laplace equation, 


8 2 u 8 2 u 

8^ + 8f 


(19.13) 


If we assume a solution of the form u(x,y) = X(x)Y(y) then, following the above method, 
and taking the separation constant as A 2 , we find 

X" = A 2 X, Y" = -A 2 L. 

Taking A 2 as > 0, the general solution becomes 

u(x,y) = (TcoshAx + B sinhAx)(C cosAy + D sinAy), (19.14) 

An alternative form, in which the exponentials are written explicitly, may be useful for 
other geometries or boundary conditions: 

u(x,y) = [A exp Ax + Bexp(— Ax)](C cos Ay + D sin Ay), (19.15) 

with different constants A and B. 

If A 2 < 0 then the roles of x and y interchange. The particular combination of sinusoidal 
and hyperbolic functions and the values of A allowed will be determined by the geometrical 
properties of any specific problem, together with any prescribed or necessary boundary 
conditions. ◄ 

We note here that a particular case of the solution (19.14) links up with the 
‘combination’ result u(x,y ) = f(x + iy ) of the previous chapter (equations (18.24) 
and following), namely that if A = B, and D = iC then the solution is the same 
as f(p) = AC exp Ip with p = x + iy. 


19.2 Superposition of separated solutions 

It will be noticed in the previous two examples that there is considerable freedom 
in the values of the separation constant A, the only essential requirement being 
that A has the same value in both parts of the solution, i.e. the part depending 
on x and the part depending on y (or f). This is a general feature for solutions 
in separated form, which, if the original PDE has n independent variables, will 
contain n — 1 separation constants. All that is required in general is that we 
associate the correct function of one independent variable with the appropriate 
functions of the others, the correct function being the one with the same values 
of the separation constants. 

If the original PDE is linear (as are the Laplace, Schrddinger, diffusion and 
wave equations) then mathematically acceptable solutions can be formed by 
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superposing solutions corresponding to different allowed values of the separation 
constants. To take a two-variable example: if 

u h (x,y) = X;Jx)Y;Jy) 

is a solution of a linear PDE obtained by giving the separation constant the value 
Ai then the superposition 

u(x,y) = aiX h (x)Y h (y) + a 2 X h (x)Y h (y) -| = ^ aiX^.(x)Y A .(y), 

i (19.16) 

is also a solution for any constants a t , provided that the A; are the allowed values 
of the separation constant A given the imposed boundary conditions. Note that 
if the boundary conditions allow any of the separation constants to be zero then 
the form of the general solution is normally different and must be deduced by 
returning to the separated ordinary differential equations. We will encounter this 
behaviour in section 19.3. 

The value of the superposition approach is that a boundary condition, say that 
u(x,y) takes a particular form f(x) when y = 0, might be met by choosing the 
constants a,- such that 

/(*) = X] a i X k( x ) Y *,( 0)- 

i 

In general, this will be possible provided that the functions XjJx) form a complete 
set - as do the sinusoidal functions of Fourier series or the spherical harmonics 
that we shall discuss in subsection 19.3.2. 


►A semi-infinite rectangular metal plate occupies the region 0 < x < oo and 0 < y < b in 
the xy-plane. The temperature at the far end of the plate and along its two long sides is 
fixed at 0°C. If the temperature of the plate at x = 0 is also fixed and is given by f(y),find 
the steady-state temperature distribution u( x,y ) of the plate. Hence find the temperature 
distribution if f(y) = uo, where uq is a constant. 


The physical situation is illustrated in figure 19.1. With the notation we have used several 
times before, the two-dimensional heat diffusion equation satisfied by the temperature 
u(x,y, t) is 

/ d 2 u 8 2 u\ 8u 

K + df) = Hi’ 

with k = k/(sp). In this case, however, we are asked to find the steady-state temperature, 
which corresponds to du/dt = 0, and so we are led to consider the (two-dimensional) 
Laplace equation 

8 2 u 8 2 u 

dx 2 + df = ' 

We saw that assuming a separable solution of the form u(x,y) = X(x)Y(y) led to 
solutions such as (19.14) or (19.15), or equivalent forms with x and y interchanged. In 
the current problem we have to satisfy the boundary conditions n(x,0) = 0 = u{x,b) and 
so a solution that is sinusoidal in y seems appropriate. Furthermore, since we require 
u(oo,y) = 0 it is best to write the x-dependence of the solution explicitly in terms of 
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0 u = 0 


Figure 19.1 A semi-infinite metal plate whose edges are kept at fixed tem- 
peratures. 

exponentials rather than of hyperbolic functions. We therefore write the separable solution 
in the form (19.15) as 

u(x,y) = [A exp Ax + B exp(— 2x)](C cosly + D sin/.y). 

Applying the boundary conditions, we see firstly that u(oo,y) = 0 implies A = 0 if we 
take A > 0. Secondly, since u(x, 0) = 0 we may set C = 0, which, if we absorb the constant 
D into B, leaves us with 

u(x,y) = Bexp(— lx) sin Ay. 

But, using the condition u(x,b) = 0, we require sin Ab = 0 and so the constant A is 
constrained to equal nn/b, where n is any positive integer. 

Using the principle of superposition (19.16), the general solution satisfying the given 
boundary conditions can therefore be written 

00 

u(x,y) = B„ exp(— nnx/b) sin(imy /b), (19.17) 

n= 1 

for some constants B„. Notice that in the sum in (19.17) we have omitted negative values of 
n since they would lead to exponential terms that diverge as x — ► oo. The n = 0 term is also 
omitted since it is identically zero. Using the remaining boundary condition u(0, y) = f(y ) 
we see that the constants B„ must satisfy 

00 

f(y) = B « sin (nny/b). (19.18) 

n= 1 

This is clearly a Fourier sine series expansion of f(y) (see chapter 12). For (19.18) to 
hold, however, the continuation of f(y) outside the region 0 < y < b must be an odd 
periodic function with period 2b (see figure 19.2). We also see from figure 19.2 that if 
the original function f(y) does not equal zero at either of y = 0 and y = b then its 
continuation has a discontinuity at the corresponding point(s); nevertheless, as discussed 
in chapter 12, the Fourier series will converge to the mid-points of these jumps and hence 
tend to zero in this case. If, however, the top and bottom edges of the plate were held not 
at 0°C but at some other non-zero temperature, then, in general, the final solution would 
possess discontinuities at the corners x = 0, y = 0 and x = 0, y = b. 

Bearing in mind these technicalities, the coefficients B n in (19.18) are given by 

B " = l f 0 /(. v ) sin (~]r) d y- (19.19) 
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Figure 19.2 The continuation of f(y) for a Fourier sine series. 


Therefore, if f(y) = uo (i.e. the temperature of the side at x = 0 is constant along its 
length), (19.19) becomes 


B„ = 



2uo b rnny\ 

-T cos (-r 

b nn \ b / 


2up 

nn 


[(-!)"-!] 


J 4uo/nn 

1 ° 


Therefore the required solution is 


for n odd 
for n even. 


u(x,y) 


E 

n odd 


4 uo 

— exp 
nn 


/ nnx\ 

V VJ 



. ◄ 


In the above example the boundary conditions meant that one term in each 
part of the separable solution could be immediately discarded, making the prob- 
lem much easier to solve. Sometimes, however, a little ingenuity is required in 
writing the separable solution in such a way that certain parts can be neglected 
immediately. 


► Suppose that the semi-infinite rectangular metal plate in the previous example is replaced 
by one that in the x-direction has finite length a. The temperature of the right-hand edge 
is fixed at 0°C and all other boundary conditions remain as before. Find the steady-state 
temperature in the plate. 


As in the previous example, the boundary conditions u(x, 0) = 0 = u(x, b) suggest a solution 
that is sinusoidal in y. In this case, however, we require u = 0 on x = a (rather than at 
infinity) and so a solution in which the x-dependence is written in terms of hyperbolic 
functions, such as (19.14), rather than exponentials is more appropriate. Moreover, since 
the constants in front of the hyperbolic functions are, at this stage, arbitrary, we may 
write the separable solution in the most convenient way that ensures that the condition 
u(a,y ) = 0 is straightforwardly satisfied. We therefore write 

u(x,y) = [Acosh2(a — x) + B sinh2(fl — x)](C cos2y + D sinly). 

Now the condition u(a,y ) = 0 is easily satisfied by setting A = 0. As before the 
conditions u(x, 0) = 0 = u(x,b ) imply C = 0 and X = nn/b for integer n. Superposing the 
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solutions for different n we then obtain 

00 

u(x,y) = ^2 B„ sinh[f!7i(a — x)/b\ sm(nny/b), (19.20) 

n= 1 

for some constants B„. We have omitted negative values of n in the sum (19.20) since the 
relevant terms are already included in those obtained for positive n. Again the n = 0 term 
is identically zero. Using the final boundary condition u(0,y) = f(y) as above we find that 
the constants B„ must satisfy 


f(y) = ^2 B„ sinh(;ma/fi) sin{ nny/b), 

n= 1 

and, remembering the caveats discussed in the previous example, the B„ are therefore given 
by 


B„ = . ~ — [ f(y)sm(nny/b) dy. (19.21) 

bsmh(nna/b) J 0 

For the case where f(v) = uo, following the working of the previous example gives 
(19.21) as 


B n = — — for n odd, B„ = 0 for n even. (19.22) 

mi smn(nna/b) 

The required solution is thus 


u(x,y) = T . , ° — sinh[u7i(u — x)/b] sin hmy/b) . 

z — ' nnsmh(nna/b) 

n odd 

We note that, as required, in the limit a — > oo this solution tends to the solution of the 
previous example. ◄ 

Often the principle of superposition can be used to write the solution to 
problems with more complicated boundary conditions as the sum of solutions to 
problems that each satisfy only some part of the boundary condition but when 
added togther satisfy all the conditions. 


►find the steady-state temperature in the (finite ) rectangular plate of the previous example, 
subject to the boundary conditions u(x,b ) = 0, u(a,y) = 0 and u(0, y) = f(y ) as before, but 
now in addition u(x, 0) = g(.x). 


Figure 19.3(c) shows the imposed boundary conditions for the metal plate. Although we 
could find a solution to this problem using the methods presented above, we can arrive at 
the answer almost immediately by using the principle of superposition and the result of 
the previous example. 

Let us suppose the required solution u(x,y ) is made up of two parts: 

u(x,y) = v(x,y) + w(x,y), 

where v(x,y) is the solution satisfying the boundary conditions shown in figure 19.3(a), 


654 




19.2 SUPERPOSITION OF SEPARATED SOLUTIONS 



(c) 

Figure 19.3 Superposition of boundary conditions for a metal plate. 


whilst w(x,y) is the solution satisfying the boundary conditions in figure 19.3(h). It is clear 
that v(x,y) is simply given by the solution to the previous example, 

. , d • u \nn(a-x)~\ . / nny \ 

v(x,y) = 2_^ B„ stnh - sin — — J , 

n odd L -* 

where B„ is given by (19.21). Moreover, by symmetry, w(x,y ) must be of the same form as 
v(x,y ) but with x and a interchanged with y and b respectively, and with f(y ) in (19.21) 
replaced by g(x). Therefore the required solution can be written down immediately without 
further calculation as 


u(x,y) = ^2 B„ sinh 

n odd 




the B„ being given by (19.21) and C„ by 


a sinh(fi7ih/a) J 0 


g(x) sin(n7tx/a) dx. 


Clearly, this method may be extended to cases in which three or four sides of the plate 
have non-zero boundary conditions. ◄ 


As a final example of the usefulness of the principle of superposition we now 
consider a problem that illustrates how to deal with inhomogeneous boundary 
conditions by a suitable change of variables. 
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►A bar of length L is initially at a temperature of 0 °C. One end of the bar (x = 0) is held 
at 0°C and the other is supplied with heat at a constant rate per unit area of H. Find the 
temperature distribution within the bar after a time t. 


With our usual notation, the heat diffusion equation satisfied by the temperature u(x, t) is 


d 2 u 8u 
K dx 2 8t ’ 


with k = k/(sp), where k is the thermal conductivity of the bar, s is its specific heat 
capacity and p its density. 

The boundary conditions can be written as 


u(x, 0) = 0, u(0, f) = 0, 


8u{L, t) _ H 
8x k ’ 


the last of which is inhomogeneous. In general, inhomogeneous boundary conditions can 
cause difficulties and it is usual to attempt a transformation of the problem into an 
equivalent homogeneous one. To this end, let us assume that the solution to our problem 
takes the form 

u(x, t) = v(x, t ) + w(x), 

where the function w(x) is to be suitably determined. In terms of v and w the problem 
becomes 

(d 2 v d 2 w\_8v 
\ 8x 2 + dx 2 ) 8t ' 
v(x, 0) + w(x) = 0, 
u(0, t) + w(0) = 0, 

8v(L,t) dw(L) H 

dx dx k 


There are several ways of choosing w(x) so as to make the new problem straightforward. 
Using some physical insight, however, it is clear that ultimately (at t = oo), when all 
transients have died away, the end x = L will attain a temperature u 0 such that ku 0 /L = FI 
and there will be a constant temperature gradient u(x,co) = uqx/L. We therefore choose 


w(x) = 


Hx 

k 


Since the second derivative of w(x) is zero, v satisfies the diffusion equation and the 
boundary conditions on v are now 


v(x, 0) 


Hx 

~Y’ 


r(0, t ) = 0, 


8v(L, t) 
dx 


which are homogeneous in x. 

From (19.12) a separated solution for the one-dimensional diffusion equation is 
v(x, f ) = ( A cos Ax + B sin Ax) exp(— A 2 Kt), 

corresponding to a separation constant — A 2 . If we restrict A to be real then all these 
solutions are transient ones decaying to zero as t — > oo. These are just what is needed 
for adding to w(x) to give the correct solution as t — > oo. In order to satisfy r(0, t ) = 0, 
however, we require A = 0. Furthermore, since 


8v 

dx 


B exp (—A 2 Kt)A cos Ax, 
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Figure 19.4 The appropriate continuation for a Fourier series containing 
only sine terms. 


in order to satisfy 8v(L,t)/8x = 0 we require cos/LL = 0, and so X is restricted to take the 
values 

. nn 


where n is an odd non-negative integer, i.e. n= 1,3,5, 

Thus, to satisfy the boundary condition u(x,0) = — Hx/k , we must have 


B„ sin 

n odd 



Hx 

IT’ 


in the range x = 0 to x = L. In this case we must be more careful about the continuation 
of the function — Hx/k for which the Fourier sine series is needed. We want a series that 
is odd in x (sine terms only) and continuous as x = 0 and x = L (no discontinuities, since 
the series must converge at the end-points). This leads to a continuation of the function 
as shown in figure 19.4, with a period of L' = 4 L. Following the discussion of section 12.3, 
since this continuation is odd about x = 0 and even about x = L' / 4 = L it can indeed be 
expressed as a Fourier sine series containing only odd-numbered terms. 

The corresponding Fourier series coefficients are found to be 


_ -8HL (-1)(»-D/ 2 
kit 2 n 2 


for n odd, 


and thus the final formula for u(x, t) is 


u(x, t) 


Hx 

k 


8HL 

kn 2 


E 

n odd 


(_!)(«— 1)/2 


n 


2 



giving the temperature for all positions 0 < x < L and for all times t > 0. ◄ 


We note that in all the above examples the boundary conditions restricted the 
separation constant(s) to an infinite number of discrete values, usually integers. 
If, however, the boundary conditions allow the separation constant(s) 1 to take 
a continuum of values then the summation in (19.16) is replaced by an integral 
over 1. This is discussed further in connection with integral transform methods 
in section 19.4. 
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19.3 Separation of variables in polar coordinates 

So far we have considered the solution of PDEs only in Cartesian coordinates, 
but many systems in two and three dimensions are more naturally expressed 
in some form of polar coordinates, in which full advantage can be taken of 
any inherent symmetries. For example, the potential associated with an isolated 
point charge has a very simple expression, q/(4neor), when polar coordinates are 
used, but involves all three coordinates and square roots, when Cartesians are 
employed. For these reasons we now turn to the separation of variables in plane 
polar, cylindrical polar and spherical polar coordinates. 

Most of the PDEs we have considered so far have involved the operator V 2 , e.g. 
the wave equation, the diffusion equation, Schrodinger’s equation and Poisson’s 
equation (and of course Laplace’s equation). It is therefore appropriate that we 
recall the expressions for V 2 when expressed in polar coordinate systems. From 
chapter 10, in plane polars, cylindrical polars and spherical polars respectively 
we have 

V 2 - — — 

v 2 - 

p dp \ dp) p 2 d(j) 2 dz 2 ’ 

V 2 - f 2 — \ 1 8_f . „d_\ 1 d 2 

r 2 dr y dr J + r 2 sind dO \ 89 J + r 2 sin 2 9 d4> 2 

Of course the first of these may be obtained from the second by taking z to be 
identically zero. 


(19.23) 

(19.24) 

(19.25) 


19.3.1 Laplace’s equation in polar coordinates 

The simplest of the equations containing V 2 is Laplace’s equation, 

V 2 «(r) = 0. (19.26) 

Since it contains most of the essential features of the other more complicated 
equations we will consider its solution first. 

Laplace’s equation in plane polars 

Suppose that we need to find a solution of (19.26) that has a prescribed behaviour 
on the circle p = a (e.g. if we are finding the shape taken up by a circular drumskin 
when its rim is slightly deformed from being planar). Then we may seek solutions 
of (19.26) that are separable in p and (j) (measured from some arbitrary radius 
as (j) = 0) and hope to accommodate the boundary condition by examining the 
solution for p = a. 
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Thus, writing u(p,<j)) = P(p)O(0) and using the expression (19.23), Laplace’s 
equation (19.26) becomes 

4> 8 ( 8P\ P <3 2 0> _ 
p dp \ dp ) + p 2 8(j) 2 

Now, employing the same device as previously, that of dividing through by 
u = Pd> and multiplying through by p 2 , results in the separated equation 

p 8 f 8P\ 1 <3 2 <b 

PTp [ (> Jp ) + ~ ' 

Following our earlier argument, since the first term on the RHS is a function of 
p only, whilst the second term depends only on <j), we obtain the two ordinary 
equations 


p d 
P dp 


' dP\ 

P l~p) =n 

(19.27) 

1 d 2 S> 2 

O # 2 ~~ n ’ 

(19.28) 


where we have taken the separation constant to have the form n 2 for later 
convenience; for the present n is a general (complex) number. 

Let us first consider the case in which n =£ 0. The second equation, (19.28), then 
has the general solution 


<!>(</>) = Aexp{imf>) + B exp (—in<f>). 


(19.29) 


Equation (19.27), on the other hand, is the homogeneous equation 

p 2 P " + pP' - n-P = 0, 


which must be solved either by trying a power solution in p or by making the 
substitution p = exp t as described in subsection 15.2.1 and so reducing it to an 
equation with constant coefficients. Carrying out this procedure we find 


P(p) = Cp n + Dp~ n . 


(19.30) 


Returning to the solution (19.29) of the azimuthal equation (19.28), we can 
see that if <t, and hence u, is to be single-valued and so not change when cj) 
increases by 2n then n must be an integer. Mathematically, other values of n are 
permissible, but for the description of real physical situations it is clear that this 
limitation must be imposed. Flaving thus restricted the possible values of n in one 
part of the solution, the same limitations must be carried over into the radial part 
(19.30). Thus we may write a particular solution of the two-dimensional Laplace 
equation as 

u(p, (f>) = (A cos n(f> + B smmj))(Cp n + Dp~ n ), 


where A, B , C, D are arbitrary constants and n is any integer. 
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We have not yet, however, considered the solution when n = 0. In this case, 
the solutions of the separated ordinary equations (19.28) and (19.27) respectively 
are easily shown to be 


0(0) = A4> + B, 

P(p) = C\np + D. 

But, in order that u = BO is single-valued, we require A = 0 and so the solution 
for n = 0 is simply (absorbing B into C and D ) 

u(p, <f>) = C\wp + D. 

Superposing the solutions for the different allowed values of n, we can write 
the general solution to Laplace’s equation in plane polars as 

00 

u(p, 0) = (Co In p + D 0 ) + cos n0 + B n sin nxj>)(C n p n + D n p~ n ), 

n= 1 (19.31) 

where n can take only integer values. Negative values of n have been omitted 
from the sum since they are already included in the terms obtained for positive 
n. We note that, since In p is singular at p = 0, whenever we solve Laplace’s 
equation in a region containing the origin, Co must be identically zero. 


►A circular drumskin has a supporting rim at p = a. If the rim is twisted so that it 
is displaced vertically by a small amount e(sin 0 + 2 sin 20), where 0 is the azimuthal 
angle with respect to a given radius, find the resulting displacement u(p,<j>) over the entire 
drumskin. 


The transverse displacement of a circular drumskin is usually described by the two- 
dimensional wave equation. In this case, however, there is no time dependence and so 
u(p, 0) solves the two-dimensional Laplace equation, subject to the imposed boundary 
condition. 

Referring to (19.31), since we wish to find a solution that is finite everywhere inside 
p = a, we require Co = 0 and D„ = 0 for all n > 0. Now the boundary condition at the 
rim requires 


00 

u(a, f) = D 0 + C n a"(A n cos ntj> + B„ sin nf) = e(sin 0 + 2 sin 20). 

n=i 


Firstly we see that we require D 0 = 0 and A„ = 0 for all n. Furthermore, we must 
have = e, C 2 B 2 C 1 2 = 2e and B n = 0 for n > 2. Hence the appropriate shape for the 

drumskin (valid over the whole skin, not just the rim) is 


€0 «_ 
u(p,4>) = — sin0 + sin 20 = — ( sin 0 + — sin 20 ) . ◄ 


2 ep 


ep 


2 P 
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Laplace’s equation in cylindrical polar s 

Passing to three dimensions, we now consider the solution of Laplace’s equation 
in cylindrical polar coordinates, 


1 8 ( 8u\ 1 8 2 u 8 2 u 
p dp \ dp) p 2 deft 2 dz 2 


(19.32) 


We note here that, even when considering a cylindrical physical system, if there 
is no dependence of the physical variables on z (i.e. along the length of the 
cylinder) then the problem may be treated using two-dimensional plane polars, 
as discussed above. 

For the more general case, however, we proceed as previously by trying a 
solution of the form 


u(p, <i>,z) = P (p)<S>(4>)Z (z), 


which on substitution into (19.32) and division through by u = P0Z gives 

1 d / dP \ 1 d 2 0 1 d 2 Z 

Pp dp V dp ) O p 2 dcj) 2 + Z dz 2 

The last term depends only on z and the first and second (taken together) only 
on p and 0. Taking the separation constant to be k 2 , we find 


1 d 

Pp dp 


1 d 2 Z _ 

Zlz 2 ~ k 
f dP\ 1 d 2 <t> 2 

\ dp ) 0p 2 d(f> 2 


The first of these equations has the straightforward solution 


Z(z) = E exp (—kz) + F exp kz. 


Multiplying the second equation through by p 2 , we obtain 


p d ( dP\ 
P dp \ dp ) 


1 d 2 0 
+ O # 2 


+ k 2 p 2 = 0, 


in which the second term depends only on 0 and the other terms only on p. 
Taking the second separation constant to be nr, we find 


1 d 2 0 

0#2 


(19.33) 


%{ l ’^) +ikV - ml)P = 0 ' a9J4) 

The equation in the azimuthal angle (f> has the very familiar solution 

0(0) = C cos mcj) + D sin nuj). 
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As in the two-dimensional case, single-valuedness of u requires that m is an 
integer. However, in the particular case m = 0 the solution is 

<D(<j b) = C0 + D. 

This form is appropriate to a solution with axial symmetry ( C = 0) or one that is 
multivalued, but manageably so, such as the magnetic scalar potential associated 
with a current / (in which case C = I /(2 n) and D is arbitrary). 

Finally the p-equation (19.34) may be transformed into Bessel’s equation of 
order m by writing p = kp. This has the solution 

P{p) = AJ m (kp) + BY m (kp). 

The properties of these functions were investigated in chapter 16 and will not 
be pursued here. We merely note that Y m (kp ) is singular at p = 0, and so when 
seeking solutions to Laplace’s equation in cylindrical coordinates within some 
region containing the p — 0 axis, we require B = 0. 

The complete separated-variable solution in cylindrical polars of Laplace’s 
equation V 2 m = 0 is thus 

u(p, 4>,z) = [ AJ m (kp ) + BY m (kp)] [C cos mcj) + D sin m<f>][E exp (—kz) + F exp kz], 

(19.35) 

Of course we may use the principle of superposition to build up more general 
solutions by adding together solutions of the form (19.35) for all allowed values 
of the separation constants k and m. 


►T semi-infinite solid cylinder of radius a has its curved surface held at 0°C and its base 
held at a temperature To- Find the steady-state temperature distribution in the cylinder. 


The physical situation is shown in figure 19.5. The steady-state temperature distribution 
u(p,<f>,z ) must satisfy Laplace’s equation subject to the imposed boundary conditions. Let 
us take the cylinder to have its base in the z = 0 plane and to extend along the positive 
z-axis. From (19.35), in order that u is finite everywhere in the cylinder we immediately 
require B = 0 and F = 0. Furthermore, since the boundary conditions, and hence the 
temperature distribution, are axially symmetric we require m = 0, and so the general 
solution must be a superposition of solutions of the form J 0 (kp)exp(—kz) for all allowed 
values of the separation constant k. 

The boundary condition u(a,f,z) = 0 restricts the allowed values of k since we must 
have Jo(ka) = 0. The zeroes of Bessel functions are given in most books of mathematical 
tables, and we find that, to two decimal places, 

J 0 (x) = 0 for x = 2.40, 5.52, 8.65, .... 

Writing the allowed values of k as k„ for n = 1,2,3,... (so, for example, k\ = 2.40/a), the 
required solution takes the form 


OO 

u(p,<t>,z) = Y A n J 0 (k n p) exp (— k„z). 

n = 1 
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Figure 19.5 A uniform metal cylinder whose curved surface is kept at 0°C 
and whose base is held at a temperature To- 


By imposing the remaining boundary condition u(p,<j), 0) = 7o, the coefficients A„ can be 
found in a similar way to Fourier coefficients but this time by exploiting the orthogonality 
of the Bessel functions, as discussed in chapter 16. From this boundary condition we 
require 

00 

u(p,<t>, 0) = ^A n Jo(k n p) = To. 

n = 1 


If we multiply this expression by pJo(k r p) and integrate from p = 0 to p = a, and use the 
orthogonality of the Bessel functions J 0 (k„p), then the coefficients are given by (16.81) as 

2T C a 

A " = 27277? - / Jo(k tl p)pdp. (19.36) 

(l •> \ \K-nCl) Jo 


The integral on the RHS can be evaluated using the recurrence relation (16.68) of 
chapter 16, 

4;[zJi(z)] = zJo(z), 

which on setting z = k„p yields 

y i [kiiP-J \ (k„p )] k„pJo(k tl p). 

k„ dp 

Therefore the integral in (19.36) is given by 


f 


Jo(k n p)p dp 


- 1 -1“ 

—pJi{k„p) 
n Jo 


— aJi{k n a), 

K 
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and the coefficients A„ may be expressed as 


A n = 


2 T 0 


aJ\{k n a) 


2 To 


a 2 J^(k n a) \_ k„ J k n aJ\{k n a) 
The steady-state temperature in the cylinder is then given by 

2To 


u(p, *,z)=y: 


k n aJi(k„a) 


JoiknP) exp(— k„z). ◄ 


We note that if, in the above example, the base of the cylinder were not kept at 
a uniform temperature To, but instead had some fixed temperature distribution 
T(p,(j)), then the solution of the problem would become more complicated. In 
such a case, the required temperature distribution u(p, cp,z) is in general not axially 
symmetric, and so the separation constant m is not restricted to be zero but may 
take any integer value. The solution will then take the form 
00 00 

u{p, 4>,z) = Y^ J m (knmP)(Cnm cos mcp + D nm sin m(j ) ) exp(-/c„ m z), 

m = 0 n= 1 

where the separation constants k nm are such that J m (k nm a) = 0, i.e. k nm a is the nth 
zero of the mth-order Bessel function. At the base of the cylinder we would then 
require 

00 00 

u(p, <p, 0) = Y X] J >n(knmP)(C„ m cos m(j) + D nm sin imp) = T(p, (p ). 

m= 0 n= 1 (19.37) 

The coefficients C nm could be found by multiplying (19.37) by J q (k rq p) cos q<p, 
integrating with respect to p and cp over the base of the cylinder and exploiting 
the orthogonality of the Bessel functions and of the trigonometric functions. The 
Aim could be found in a similar way by multiplying (19.37) by J q (k rq p) sin qcp. 


Laplace’s equation in spherical polars 

We now come to an eequation that is very widely applicable in physical science, 
namely V 2 m = 0 in spherical polar coordinates: 


1 3 / 2 3m\ 1 <5/. (3n\ 1 8 2 u 

r 2 dr \ dr J ^ r 2 sin 0 d6 \ d6 ) r 2 sin 2 9 d(p 2 


(19.38) 


Our method of procedure will be as before; we try a solution of the form 


u(r,d,<P) = R(r)®(6) <*>(<£). 


Substituting this in (19.38), dividing through by u = RQ O and multiplying by r 2 , 
we obtain 


1 d / 2 dR\ 1 d / . d@\ 1 d 2 <h 

Rdr \ dr J + 0 sin 0 d6 \ m dO ) + ® sin 2 0 dcp 2 


(19.39) 
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The first term depends only on r and the second and third terms (taken together) 
only on 6 and 0. Thus (19.39) is equivalent to the two equations 


}_±( r 2 d A\ 

R dr \ dr J 


= k. 


(19.40) 


1 


© sin 0 dO 


sind 


d® 


1 d 2 0 


dO J cD sin 2 8 d/fi 2 


= -A. 


(19.41) 


Equation (19.40) is a homogeneous equation. 


,d 2 R 


dR 


r ~tt + 2r— AR = 0, 

dr- dr 

which can be reduced by the substitution r = expf (and writing R{r) = S(t)) to 

£+#-*- o. 

dt 2 dt 

This has the straightforward solution 

S(t) = A exp Ait + B exp Ait, 
and so the solution to the radial equation is 

R(r) = Ar Xl +Br h , 


where A\ + A 2 = — 1 and A 1 A 2 = —A. We can thus take A\ and A 2 as given by { 
and —(t + 1); A then has the form /(/ +1). (It should be noted that at this stage 
nothing has been either assumed or proved about whether t is an integer.) 

Hence we have obtained some information about the first factor in the 
separated-variable solution, which will now have the form 

u(r, 8, c/>) = [A/ + Br~V +1) ] 0(0)O(0), (19.42) 


where 0 and O must satisfy (19.41) with A = A (A +1). 

The next step is to take (19.41) further. Multiplying through by sin 2 0 and 
substituting for A, it too takes a separated form: 


sinO d 

~Wd8 



+ /(/ + !) sin 2 8 


1 d 2 O 

Oc/02 


= 0. 


(19.43) 


Taking the separation constant as m 2 , the equation in the azimuthal angle 0 
has the same solution as in cylindrical polars, namely 


<D(0) = C cos m0 + D sinm0. 


As before, single-valuedness of u requires that m is an integer; for m = 0 we again 
have <D(0) = C0 + D. 
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Having settled the form of <1>( c fi), we are left only with the equation satisfied by 
0(0), which is 

% < w - 44 > 

A change of independent variable from 0 to n = cos 0 will reduce this to a 
form for which solutions are known, and of which some study has been made in 
chapter 16. Putting 

„-co s «. | --sine. -L— 

the equation for M(fi) = 0(0) reads 

^ [(1 - ] + W + 1) - M = 0. (19.45) 

dji \_ d/r J [ 1 — fi 

This equation is the associated Legendre equation, which was mentioned in sub- 
section 17.5.2 in the context of Sturm-Liouville equations. 

We recall that for the case m = 0, (19.45) reduces to Legendre’s equation, which 
was studied at length in chapter 16, and has the solution 

M([i) = EP f (fi) + F (19.46) 

We have not solved (19.45) explicitly for general m, but the solutions were given 
in subsection 17.5.2 and are the associated Legendre functions P"'(/ ( ) and G™(/<)> 
where 

d\ m \ 

P;'(/d = (1 - /i 2 ) H/2 ^PH/0, (19.47) 

and similarly for Q n /(q). We then have 

M(ri = EP?(n) + FQ?(n); (19.48) 

here m must be an integer, 0 < \m\ < We note that if we require solutions to 
Laplace’s equation that are finite when /< = cos 0 = +1 (i.e. on the polar axis 
where 0 = 0, n), then we must have F = 0 in (19.46) and (19.48) since Q!J(ji) 
diverges at q = +1. 

It will be remembered that one of the important conditions for obtaining 
finite polynomial solutions of Legendre’s equation is that { is an integer > 0. 
This condition therefore applies also to the solutions (19.46) and (19.48) and is 
reflected back into the radial part of the general solution given in (19.42). 

Now that the solutions of each of the three ordinary differential equations 
governing R, 0 and have been obtained, we may assemble a complete separated- 
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variable solution of Laplace’s equation in spherical polars. It is 

«(r, d,(j>) = (Ar e + Br~^ +l f(C cos m<j) + D sin mcj))[E Pf (cos 9) + FQ™( cosd)], 

(19.49) 

where the three bracketted factors are connected only through the integer pa- 
rameters t and m, 0 < \m\ < t. As before, a general solution may be obtained 
by superposing solutions of this form for the allowed values of the separation 
constants /' and m. As mentioned above, if the solution is required to be finite on 
the polar axis then F = 0 for all t and m. 


►do uncharged conducting sphere of radius a is placed at the origin in an initially uniform 
electrostatic field E. Show that it behaves as an electric dipole. 


The uniform field, taken in the direction of the polar axis, has a electrostatic potential 

u = — Ez = —Er cos 9, 

where u is arbitrarily taken as zero at z = 0. This satisfies Laplace’s equation V 2 u = 0, as 
must the potential v when the sphere is present; for large r the asymptotic form of v must 
still be — Er cos 9. 

Since the problem is clearly axially symmetric we have immediately that m = 0, and 
since we require v to be finite on the polar axis we must have F = 0 in (19.49). Therefore 
the solution must be of the form 

00 

v(r, 9,(f > ) = y iA,/ + B/r~^ +1 fP/{cos9). 

r=o 

Now the cos 0-dependence of v for large r indicates that the (9, (//(-dependence of v(r,9,c/>) 
is given by Pj > (cos9) = cos 9. Thus the /--dependence of v must also correspond to an 
{ = 1 solution, and the most general such solution (outside the sphere, i.e. for r > a) is 

v(r, 9, tf>) = (Air + Byr~ 2 )Py( cos 9). 

The asymptotic form of v for large r immediately gives Ay = —E and so yields the solution 

v(r, 9 , (j>) = Er + cos @- 

Since the sphere is conducting, it is an equipotential region and so v must not depend on 
9 for r = a. This can only be the case if By/a 2 = Ea , thus fixing By. The final solution is 
therefore 

v(r,9,<j)) = — Er ^1 — cos 9. 

Since a dipole of moment p gives rise to a potential p/(4ne 0 r 2 ), this result shows that the 
sphere behaves as a dipole of moment 4neoa 3 E, because of the charge distribution induced 
on its surface; see figure 19.6. ◄ 


Often the boundary conditions are not so easily met, and it is necessary to 
use the mutual orthogonality of the associated Legendre functions (and the 
trigonometric functions) to obtain the coefficients in the general solution. 
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Figure 19.6 Induced charge and field lines associated with a conducting 
sphere placed in an initially uniform electrostatic field. 


►A hollow split conducting sphere of radius a is placed at the origin. If one half of its 
surface is charged to a potential vo and the other half is kept at zero potential, find the 
potential v inside and outside the sphere. 


Let us choose the top hemisphere to be charged to vo and the bottom hemisphere to be 
at zero potential, with the plane in which the two hemispheres meet perpendicular to the 
polar axis; this is shown in figure 19.7. The boundary condition then becomes 


f vo for 0 < 9 < n/2 (O<cos0<l), 

y 0 for 7t/2 <6 < n (—1 < cos 9 < 0). 


(19.50) 


The problem is clearly axially symmetric and so we may set m = 0. Also, we require the 
solution to be finite on the polar axis and so it cannot contain Qficosd). Therefore the 
general form of the solution to (19.38) is 

00 

v(r, 9, f) = ^^( A e / + B/r~ ,/+1) )P l '(cos 9). (19.51) 

z=o 

Inside the sphere (for r < a) we require the solution to be finite at the origin and so 
B f = 0 for all t in (19.51). Imposing the boundary condition at r = a we must then have 

00 

v{a,6,(j)) = A/a { Pf(c os 9), 

e=o 


where v(a. 9,<j)) is also given by (19.50). Exploiting the mutual orthogonality of the Legendre 
polynomials, the coefficients in the Legendre polynomial expansion are given by (16.48) 
as (writing p = cos 9) 


A t a e = 


f 2/ T 1 


2 

2f+ 1 


v(a, 9, (f>)Pf (fi)dp 


v 0 / P/(p)dp, 
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Figure 19.7 A hollow split conducting sphere with its top half charged to a 
potential y 0 and its bottom half at zero potential. 


where in the last line we have used (19.50). The integrals of the Legendre polynomials are 
easily evaluated (see exercise 17.7) and we find 


. v o , 3r 0 . „ 

A-o — A\ — — , A 2 — 0, 

2 4 a 

so that the required solution inside the sphere is 


Ax = ■ 


7r 0 
16a 3 ’ 


v(r, 9,<t>) = j 


1 + ^-Pi(cos0) - ^P 3 (cos0) + ■ 


Outside the sphere (for r > a) we require the solution to be bounded as r tends to 
infinity and so in (19.51) we must have A e = 0 for all (. In this case, by imposing the 
boundary condition at r = a we require 

00 

v(a, 6, cf>) = ^ B^a~ <,r+1) P/(cos 9), 
z=o 


where v(a,d,<j > ) is given by (19.50). Following the above argument the coefficients in the 
expansion are given by 


B(d (m) = J + \ J Pf(P)dn, 


so that the required solution outside the sphere is 
vo a 


v(r, 9, (j>) = 


2 r 


1 + ^Pi(cos 8) - ^Px(cos9) + ■ 
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In the above example, on the equator of the sphere (i.e. at r = a and 8 = n/2) 
the potential is given by 

v(a, n/2, (f>) = vo/2, 

i.e. mid-way between the potentials of the top and bottom hemispheres. This is 
so because a Legendre polynomial expansion of a function behaves in the same 
way as a Fourier series expansion, in that it converges to the average of the two 
values at any discontinuities present in the original function. 

If the potential on the surface of the sphere had been given as a function of 8 
and (f>, then we would have had to consider a double series summed over t and 
m (for —f < m < /), since, in general, the solution would not have been axially 
symmetric. 


19.3.2 Spherical harmonics 

When obtaining solutions in spherical polar coordinates of V 2 m = 0, we found 
that, for solutions that are finite on the polar axis, the angular part of the solution 
was given by 

@(0)<I>(^>) = P/"(cos0)(C cos m<t> + D sin mcp). 


This general form is sufficiently common that particular functions of 8 and <f) 
called spherical harmonics are defined and tabulated. The spherical harmonics 
Y/ n (Q, (/)) are defined for m > 0 by 


Y?(8,<i>) = (-!)“ 


2f+\ 

An (/ + m ) ! 



P ”\ cos 8) exp (im<j)). 


For values of m < 0 the relation 


Y^ H (8,4>) = (— 1)M 


Y} m '(8 ,4>) 


(19.52) 


defines the spherical harmonic, the asterisk denoting complex conjugation. Since 
they contain as their 0-dependent part the solution P’/' to the associated Legendre 
equation, which is a Sturm-Liouville equation (see chapter 17), the Y"' are 
mutually orthogonal when integrated from —1 to +1 over d(cos8). Their mutual 
orthogonality with respect to (j) (0 < <j> < 27i) is even more obvious. The numerical 
factor in (19.52) is chosen to make the Y™ an orthonormal set, that is 

1 fin 

/ [y;'(8, </>)] * y/(0, 4>) del) d( cos 8) = bwd mm , 

1 Jo 

In addition, the spherical harmonics form a complete set in that any reasonable 
function (i.e. one that is likely to be met in a physical situation) of 8 and cj) can 


670 



19.3 SEPARATION OF VARIABLES IN POLAR COORDINATES 


be expanded as a sum of such functions, 

00 { 

= )> (19.53) 

tf=0 m=—/f 

the constants a/ m being given by 

/ I /*27T 

J [Y, m (6, (/))]* f (6, 4>)d<j)d( cos 6). (19.54) 

This is in exact analogy with a Fourier series and is a particular example of the 
general property of Sturm-Liouville solutions. 


The first few spherical harmonics Y™(9,<))) = Y"‘ 

are as follows: 

II 

H 

o 

y^cosd, 

i+ 

II 

+ \f^ sin 0 exp (+;'(/>), Y 2 ° = 

^T(3cos 2 0-1), 

II 

+ sin 0 cos 9 exp (+/</)), Y± 2 = 

J jY s j n 2 g exp(+2(V/j) 


19.3.3 Other equations in polar coordinates 

The development of the solutions of V 2 m = 0 carried out in the previous subsection 
can be employed to solve other equations in which the V 2 operator appears. Since 
we have discussed the general method in some depth already, only an outline of 
the solutions will be given here. 

Let us first consider the wave equation 

V 2 u=^, (19.55) 

and look for a separated solution of the form u = F(r)T(f), so that initially we 
are separating only the spatial and time dependences. Substituting this form into 
(19.55) and taking the separation constant as k 2 we obtain 

d 2 T 

V 2 F + k 2 F = 0, — ^+fc 2 c 2 T = 0. (19.56) 

dt- 

The second equation has the simple solution 

T(t) = Aexp(iojt) + B exp(—ia>t), (19.57) 


where m = kc; this may also be expressed in terms of sines and cosines, of course. 
The first equation in (19.56) is referred to as Helmholtz’s equation ; we discuss it 
below. 

We may treat the diffusion equation 
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in a similar way. Separating the spatial and time dependences by assuming a 
solution of the form u = F(r)T(t), and taking the separation constant as k 2 , we 
hnd 

V 2 i 7 + k 2 F — 0, + k 2 K T = 0. 

dt 

Just as in the case of the wave equation, the spatial part of the solution satisfies 
Helmholtz’s equation. It only remains to consider the time dependence, which 
has the simple solution 

T(f) = A exp (—k 2 Kt). 

Helmholtz’s equation is clearly of central importance in the solutions of the 
wave and diffusion equations. It can be solved in polar coordinates in much the 
same way as Laplace’s equation, and indeed reduces to Laplace’s equation when 
k = 0. Therefore, we will merely sketch the method of its solution in each of the 
three polar coordinate systems. 


Helmholtz’s equation in plane polar s 

In two-dimensional plane polar cooordinates Helmholtz’s equation takes the form 


1 8 ( 8F\ 
P dp V d P ) 


1 8 2 F 

p 2 d(j) 2 


+ k 2 F = 0. 


If we try a separated solution of the form F(r) = P(p)O(0), and take the 
separation constant as nr, we find 


d 2 P 

dp 2 


U 9 . 

j^ + " ri * =0 - 
1 dP ( 2 m 2 \ 

~P^P + V 7-) 


p =o. 


As for Laplace’s equation, the angular part has the familiar solution (if m 0) 


<!>(</>) = A cos m<fi + B sin m(f>, 


or an equivalent form in terms of complex exponentials. The radial equation 
differs from that found in the solution of Laplace’s equation, but by making the 
substitution p = kp it is easily transformed into Bessel’s equation of order m 
(discussed in chapter 16), and has the solution 


P(p) = CJ m (kp) + DY m {kp), 

where Y m is a Bessel function of the second kind, which is infinite at the origin 
and is not to be confused with a spherical harmonic (these are written with a 
superscript as well as a subscript). 

Putting the two parts of the solution together we have 

F(p,<t>) = [A cos nn/) + B sin m<f)\[CJ m (kp) + DY ln (kp)\- (19.58) 
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Clearly, for solutions of Helmholtz’s equation that are required to be finite at the 
origin, we must set D = 0. 


>-Find the four lowest frequency modes of oscillation of a circular drumskin of radius a 
whose circumference is held fixed in a plane. 


The transverse displacement u(r, f) of the drumskin satisfies the two-dimensional wave 
equation 


V 2 u = 


1 8 2 u 
c 2 8t 2 ’ 


with c 2 = T f 7 , where T is the tension of the drumskin and a is its mass per unit area. 
From (19.57) and (19.58) a separated solution of this equation, in plane polar coordinates, 
that is finite at the origin is 


u(p.cj), t ) = J m (kp)(A cos m<j> + B sin m</>) exp(+icot), 

where co = kc. Since we require the solution to be single-valued we must have m as an 
integer. Furthermore, if the drumskin is clamped at its outer edge p = a then we also 
require u(a, tj>, t ) = 0. Thus we need 

J m (ka) = 0, 


which in turn restricts the allowed values of k. The zeroes of Bessel functions can be 
obtained from most books of tables, and the first few are 


Jq(x) = 0 for x as 2.40, 5.52, 8.65, . . . , 

Jfx) = 0 for x « 3.83, 7.02, 10.17, ... , 

J 2 (x) = 0 for x « 5.14, 8.42, 11.62.... 


The smallest value of x for which any of the Bessel functions is zero is x « 2.40, which 
occurs for Jfix). Thus the lowest-frequency mode has k = 2.40/u and angular frequency 
co = 2.40c/ a. Since m = 0 for this mode, the shape of the drumskin is 


u oc Jo (2AO - j ; 

this is illustrated in figure 19.8. 

Continuing in the same way the next three modes are given by 

co = 3.83-, u oc J\ (^3.83 cos tf>. J\ ("3.83-') sin tjr, 
a V a) \ a ) 

.14-, u oc f5.14—') cos2<^>, Ji (^5.14-') sin 2<j>; 
a \ a) \ aJ 


co = 5 


co = 5.52- 


a 


u oc 


( 5!2 «) • 


These modes are also shown in figure 19.8. We note that the second and third frequencies 
have two corresponding modes of oscillation; these frequencies are therefore two-fold 
degenerate. ◄ 


Helmholtz’s equation in cylindrical polars 

Generalising the above method to three-dimensional cylindrical polars is straight- 
forward, and following a similar procedure to that used for Laplace’s equation 


673 




PDES: SEPARATION OF VARIABLES AND OTHER METHODS 



co = 5.14 c/a co = 5.52 c/a 


Figure 19.8 For a circular drumskin of radius a , the modes of oscillation 
with the four lowest frequencies. The dotted lines indicate the nodes, where 
the displacement of the drumskin is always zero. 

we find the separated solution of Helmholtz’s equation takes the form 



x (C cos mcf) + D sin mcf>)[E exp (iaz) + F exp(— iaz)], 

where a and m are separation constants. We note that the angular part of the 
solution is the same as for Laplace’s equation in cylindrical polars. 

Helmholtz’s equation in spherical polars 

In spherical polars, we find again that the angular parts of the solution @(0)0(0) 
are identical to those of Laplace’s equation in this coordinate system, i.e. they are 
the spherical harmonics Yp(Q,<f>), and so we shall not discuss them further. 

The radial equation in this case is given by 

r 2 R" + 2 rR' + [k 2 r 2 - ((( + 1 )]R = 0, (19.59) 

which has an additional term k 2 r 2 R compared with the radial equation for the 
Laplace solution. The equation (19.59) looks very much like Bessel’s equation and 
can in fact be reduced to it by writing R(r) = r~ 1 ^ 2 S(r). The function S(r) then 
satisfies 

r 2 S" + rS'+ k 2 r 2 -(f+\) 2 S = 0, 

which, after changing the variable to p = kr, is Bessel’s equation of order £ + \ 
and has as its solutions S(p) = Jr+i/ 2 (h) an d T/ +1(/2 (p)- The separated solution to 
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Helmholtz’s equation in spherical polars is thus 

F(r, 6, 4>) — r _1/2 [HJ/ +1 / 2 (fcr) + BY^ + i/ 2 (kr)](C cosnuj) + D sin mtfi) 

x [£P”(cos 6) + F<2™(cos 0)]. (19.60) 


For solutions that are finite at the origin we require B = 0, and for solutions that 
are finite on the polar axis we require F = 0. 

It is worth mentioning that the solutions proportional to r -1 / 2 J/ + i/ 2 {kr) when 
suitably normalised are called spherical Bessel functions and are denoted by j/fkr): 


Mb) 



They are trigonometric functions of p (as discussed in chapter 16), and for t = 0 
and ( = 1 are given by 


Mb) = si n B, 

. sin p 
hip) = cos p. 

B 

The second, linearly-independent, solution of (19.59), n/(p), is derived from 
Y/+] /lip) in a similar way. 

As mentioned at the beginning of this subsection, the separated solution of 
the wave equation in spherical polars is the product of the time-dependent part 
(19.57) and a spatial part (19.60). It will be noticed that, although this solution 
corresponds to a solution of definite frequency to = kc, the zeroes of the radial 
function jf(kr) are not equally spaced in r, except for the case / = 0 involving 
7o((cr), and so there is no precise wavelength associated with the solution. 

To conclude this subsection, let us mention briefly the Schrodinger equation 
for the electron in a hydrogen atom, the nucleus of which is taken at the origin 
and is assumed massive compared with the electron. Under these circumstances 
the Schrodinger equation is 



e~ u du 

= in—. 

4 7160 r St 


For a ‘stationary-state’ solution, for which the energy is a constant E and the time- 
dependent factor T in u is given by T(f) = A exp(—iEt/h), the above equation 
is similar to, but not quite the same as, the Helmholtz equation.! However, as 
with the wave equation, the angular parts of the solution are identical to those 
for Laplace’s equation and are expressed in terms of spherical harmonics. 

The important point to note is that for any equation involving V 2 , provided 8 
and tf> do not appear in the equation other than as part of V 2 , a separated-variable 


) For the solution by series of the r-equation in this case the reader may consult, e.g., Schili'. Quantum 
Mechanics (McGraw-Hill, 1955) p. 82. 
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solution in spherical polars will always lead to spherical harmonic solutions. This 
is the case for the Schrodinger equation describing an atomic electron whenever 
the potential is central, i.e. whenever L(r) is in fact V(r). 


19.3.4 Solution by expansion 

It is sometimes possible to use the uniqueness theorem discussed in the last 
chapter, together with the results of the last few subsections, in which Laplace’s 
equation (and other equations) were considered in polar coordinates, to obtain 
solutions of such equations appropriate to particular physical situations. 

We will illustrate the method for Laplace’s equation in spherical polars and first 
assume that the required solution of V 2 n = 0 can be written as a superposition 
in the normal way: 

00 zf 

u(r,d,cj>) = (A/ + Br'~^ +1} )P'/'(cos8)(C cos mtj) + D sin m(f>). 

rf=0 m=—t (19.61) 

Here, all the constants A, B , C, D may depend upon f and m, and we have 
assumed that the required solution is finite on the polar axis. As usual, boundary 
conditions of a physical nature will then fix or eliminate some of the constants; 
for example, u finite at the origin implies all B = 0, or axial symmetry implies 
that only m = 0 terms are present. 

The essence of the method is then to find the remaining constants by determin- 
ing u at values of r, 6, f> for which it can be evaluated by other means, e.g. by direct 
calculation on an axis of symmetry. Once the remaining constants have been fixed 
by these special considerations to have particular values, the uniqueness theorem 
can be invoked to establish that they must have these values in general. 


► Calculate the gravitational potential at a general point in space due to a uniform ring of 
matter of radius a and total mass M. 

Everywhere except on the ring the potential u(r) satisfies the Laplace equation, and so if 
we use polar coordinates with the normal to the ring as polar axis, as in figure 19.9, a 
solution of the form (19.61) can be assumed. 

We expect the potential u(r, 9,<j>) to tend to zero as r — > oo, and also to be finite at r = 0. 
At first sight this might seem to imply that all A and B, and hence u, must be identically 
zero, an unacceptable result. In fact, what it means is that different expressions must apply 
to different regions of space. On the ring itself we no longer have V 2 » = 0 and so it is not 
surprising that the form of the expression for u changes there. Let us therefore take two 
separate regions. 

In the region r > a 

(i) we must have u — > 0 as r — * oo, implying that all A = 0, and 

(ii) the system is axially symmetric and so only m = 0 terms appear. 

With these restrictions we can write as a trial form 

OO 

u(r, e,<t>) = ^2 B^r~ (W) Pf (cos 9). (19.62) 

t = 0 
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Figure 19.9 The polar axis Oz is taken as normal to the plane of the ring of 
matter and passing through its centre. 


The constants B f are still to be determined; this we do by calculating directly the potential 
where this can be done simply - in this case, on the polar axis. 

Considering a point P on the polar axis at a distance z (> a) from the plane of the ring 
(taken as 9 = n /2), all parts of the ring are at a distance (z 2 +a 2 ) 1/2 from it. The potential 
at P is thus straightforwardly 

u(z, 0,<ft) = — ( _ 2 ™ 2)1/2 , (19.63) 

where G is the gravitational constant. This must be the same as (19.62) for the particular 
values r = z, 6 = 0, and <j> undefined. Since P°(cos9) = P e ( cosd) with P^(l) = 1, putting 
r = z in (19.62) gives 

00 D 

u(z,0,(j)) = '^2-^. (19.64) 

f=o z 


However, expanding (19.63) for z > a (as it applies to this region of space) we obtain 


u(z,0,<j>) = - — 

Z 


1 /a\ 2 3 /aN 

>-2(1) + s(;) - 


which on comparison with (19.64) givesf 
B 0 = - GM , 

GMa 2, (— 1)^(2/ — 1)!! 


Bzf ■ 

Bir+i = 0 . 


2 l (\ 


for f > 1, 


(19.65) 


We now conclude the argument by saying that if a solution for a general point (r,d,tj)) 
exists at all, which of course we very much expect on physical grounds, then it must be 
(19.62) with the B r given by (19.65). This is so because thus defined it is a function with 
no arbitrary constants and which satisfies all the boundary conditions, and the uniqueness 


t (K- 1)!! = 1 x 3 x ••• x (2/- 1). 
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theorem states that there is only one such function. The expression for the potential in the 
region /• > a is therefore 


u(r,e,4>) = - — 
r 


1+E 


(- 1 /( 2 /-!)!! / a\ 


2 e e\ 


■J 


P2/(COs0) 


The expression for r < a can be found in a similar way. The finiteness of u at r = 0 and 
the axial symmetry give 

00 

u(r, 0,<j)) = A e r e Pf( cos 9). 

(=0 


Comparing this expression for r = z, 9 = 0 with the z < a expansion of (19.63), which is 
valid for any z, establishes A 2 /+i = 0 , A 0 = —GM/a and 


A->/ — - 


GM (-1/(2/-!)!! 


nU + 1 


2 e O. 


so that the final expression valid, and convergent, for r < a is thus 


u(r,0,</>) = - — 
a 




(— 1 /( 2 /— 1 )!! fr y- 


2f£\ 


a) 


P 2 /{co&9) 


It is easy to check that the solution obtained has the expected physical value for large r 
and for r = 0 and is continuous at r = a. ◄ 


19.3.5 Separation of variables for inhomogeneous equations 

So far our discussion of the method of separation of variables has been limited 
to the solution of homogeneous equations such as the Laplace equation and the 
wave equation. The solutions of inhomogeneous PDEs are usually obtained using 
the Green’s function methods to be discussed below in section 19.5. However, as a 
final illustration of the usefulness of the separation of variables, we now consider 
its application to the solution of inhomogeneous equations. 

Because of the added complexity in dealing with inhomogeneous equations, we 
shall restrict our discussion to the solution of Poisson’s equation, 

V 2 u = p( r), (19.66) 

in spherical polar coordinates, although the general method can accommodate 
other coordinate systems and equations. In physical problems the RHS of (19.66) 
usually contains some multiplicative constant(s). If u is the electrostatic potential 
in some region of space in which p is the density of electric charge then V 2 n = 
— p(r)/eo. Alternatively, u might represent the gravitational potential in some 
region where the matter density is given by p, so that V 2 w = 47iGp(r). 

We will simplify our discussion by assuming that the required solution u is 
finite on the polar axis and also that the system possesses axial symmetry about 
that axis - in which case p does not depend on the azimuthal angle tj). The key 
to the method is then to assume a separated form for both the solution u and the 
density term p. 
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From the discussion of Laplace’s equation, for systems with axial symmetry 
only m = 0 terms appear, and so the angular part of the solution can be 
expressed in terms of Legendre polynomials P/(cos 6). Since these functions form 
an orthogonal set let us expand both u and p in terms of them: 

00 

u = ^P/(r)PHcos0), 

^=0 

00 

P =^2 F Ar) p Acosd), 

<r=o 

where the coefficients Rr(r) and F/(r) in the Legendre polynomial expansions 
are functions of r. Since in any particular problem p is given, we can find the 
coefficients Ff(r) in the expansion in the usual way (see subsection 16.6.2). It then 
only remains to find the coefficients Rf(r) in the expansion of the solution u. 

Writing V 2 in spherical polars and substituting (19.67) and (19.68) into (19.66) 
we obtain 

00 


tf=0 

(19.69) 



Pf( cosd) d I 2 dR/\ 


dr 


R, t 


dr J r 2 sin 8 dd 


. dPAcos8)\ 


= J2 F Ar)PAcos8). 


(19.67) 

(19.68) 


However, if, in equation (19.44) of our discussion of the angular part of the 
solution to Laplace’s equation, we set m = 0 we conclude that 


1 


sin 9 d6 


sind 


dP, 


y(cOS0)\ _ 
dO ) ~ 


f(( + l)PHcosd). 


Substituting this into (19.69), we find that the LHS is greatly simplified and we 
obtain 


00 


E 

/=0 


'1 d 
r 2 dr 



+ 1)P^ 

r 2 


P/( cos 8) = E FAr)PAcos8). 
/=o 


This relation is most easily satisfied by equating terms on both sides for each 
value of i separately, so that for t = 0, 1,2, ... we have 


Id/ 2 dR/ 
r 2 dr \ dr 


/(/+! )Rr 

r 2 


P/(r). 


(19.70) 


This is an ODE in which P^(r) is given, and it can therefore be solved for 
R/{r). The solution to Poisson’s equation, u, is then obtained by making the 
superposition (19.67). 
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►/n a certain system, the electric charge density p is distributed as follows: 

_ ( Ar cos 9 for 0 <r < a, 

P ~ 1 0 for r > a. 

Find the electrostatic potential inside and outside the charge distribution, given that both 
the potential and its radial derivative are continuous everywhere. 


The electrostatic potential u satisfies 

„ 2 _ ( —(A/eo)rcos9 for 0 < r < a, 

11 ~ 1 0 for r > a. 

For r < a the RHS can be written — (H/e o )rPi(cos0), and the coefficients in (19.68) are 
simply Fi(r) = —{Ar/e o) and F ( (r) = 0 for / f Therefore we need only calculate Rfr), 
which satisfies (19.70) for / = \ : 


1 d b 2 dRi\ 2P| Ar 

r 2 dr \ dr J r 2 eo 

This can be rearranged to give 

r 2 R'( + 2rR[ — 2 Ri = , 

fo 

where the prime denotes differentiation with respect to r. The LHS is homogeneous and 
the equation can be reduced by the substitution r = expf, and writing Rffr) = S(t), to 

S + S-2S = --exp3f, (19.71) 

<?o 

where the dots indicate differentiation with respect to t. 

This is an inhomogeneous second-order ODE with constant coefficients and can be 
straightforwardly solved by the methods of subsection 15.2.1 to give 

^4 

S(t) = ci exp t + c 2 exp(— 2t ) exp 3 f. 

10e 0 


Recalling that r = exp f we find 

o ^4 -5 

Ri(r) = cpr + c 2 r~ - — — r . 

10e 0 

Since we are interested in the region r < a we must have c 2 = 0 for the solution to remain 
finite. Thus inside the charge distribution the electrostatic potential has the form 


i<i(r, 9,f) 



Pi (cos 6). 


(19.72) 


Outside the charge distribution (for r > a), however, the electrostatic potential obeys 
Laplace's equation, V 2 u = 0, and so given the symmetry of the problem and the requirement 
that u — ► go as r — > go the solution must take the form 


u 2 (r, 9, f) = ^2 -^r p c(cos 0). 
<?=o r 


(19.73) 


We can now use the boundary conditions at r = a to fix the constants in (19.72) and 
(19.73). The requirement of continuity of the potential and its radial derivative at r = a 
imply that 


ui(a, 9, f) 


8ui 

dr 


(a, 6, <j>) 


u 2 (a, 9, cj>), 

du2 < a 
-z— ( a,0,(f) 

dr 
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Clearly B ( = 0 for { f 1 ; carrying out the necessary differentiations and setting r = a in 
(19.72) and (19.73) we obtain the simultaneous equations 


cia - — —a 
10e 0 

3 A , 

Cl “ To — a ~ 
10e 0 


Bi 
a 2 ’ 
25 1 


which may be solved to give c i = Aa 2 /{ 6e 0 ) and 5 t = Aa s /( 15e 0 )- Since Pfcos9) = cosO, 
the electrostatic potentials inside and outside the charge distribution are given respectively 
by 


Mi (r, 0,(j)) 


A (of 
eo V 6 



U2(r, 9 , (j>) 


Aa 5 cos 9 
15e 0 r 2 


19.4 Integral transform methods 

In the method of separation of variables our aim was to keep the independent 
variables in a PDE as separate as possible. We now discuss the use of integral 
transforms in solving PDEs, a method by which one of the independent variables 
can be eliminated from the differential coefficients. It will be assumed that the 
reader is familiar with Laplace and Fourier transforms and their properties, as 
discussed in chapter 13. 

The method consists simply of transforming the PDE into one containing 
derivatives with respect to a smaller number of variables. Thus, if the original 
equation has just two independent variables, it may be possible to reduce the 
PDE into a soluble ODE. The solution obtained can then (where possible) be 
transformed back to give the solution of the original PDE. As we shall see, 
boundary conditions can usually be incorporated in a natural way. 

Which sort of transform to use, and the choice of the variable(s) with respect 
to which the transform is to be taken, is a matter of experience; we illustrate this 
in the example below. In practice, transforms can be taken with respect to each 
variable in turn, and the transformation that affords the greatest simplification 
can be pursued further. 


►T semi-infinite tube of constant cross-section contains initially pure water. At time t = 0, 
one end of the tube is put into contact with a salt solution and maintained at a concentration 
u o- Find the total amount of salt that has diffused into the tube after time t, if the diffusion 
constant is k. 


The concentration u(x, t ) at time t and distance x from the end of the tube satisfies the 
diffusion equation 


d 2 u 8u 
K 8x 2 8t’ 


(19.74) 


which has to be solved subject to the boundary conditions u(0,t) = u 0 for all t and 
u(x, 0) = 0 for all x > 0. 

Since we are interested only in f > 0, the use of the Laplace transform is suggested. 
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Furthermore, it will be recalled from chapter 13 that one of the major virtues of Laplace 
transformations is the possibility they afford of replacing derivatives of functions by simple 
multiplication by a scalar. If the derivative with respect to time were so removed, equation 
(19.74), would contain only differentiation with respect to a single variable. Let us therefore 
take the Laplace transform of (19.74) with respect to f: 



8 2 u 


exp(— st) dt 



8u 

dt 


exp(—st)dt. 


On the LHS the (double) differentiation is with respect to x, whereas the integration is 
with respect to the independent variable t. Therefore the derivative can be taken outside 
the integral. Denoting the Laplace transform of u(x, t) by u(x,s) and using result (13.57) 
to rewrite the transform of the derivative on the RHS (or by integrating directly by parts), 
we obtain 


d 2 u 

K — — r- = su(x, s) — u(x, 0). 

ox 1 


But from the boundary condition u(x, 0) = 0 the last term on the RHS vanishes, and the 
solution is immediate: 


x) + B exp 

where the constants A and B may depend on s. 

We require u(x, t) — > 0 as x — * oo and so we must also have u(oo,s) = 0; consequently 
we require that A = 0. The value of B is determined by the need for u(0, f) = no and hence 
that 

f°° u 

h(0, s)= / uq exp( — st) dt = — . 

Jo s 

We thus conclude that the appropriate expression for the Laplace transform of u(x, t) is 


u(x, s) = A exp 


u(x,s) = 



(19.75) 


To obtain u(x, t) from this result requires the inversion of this transform - a task that is 
generally difficult and requires a contour integration. This is discussed in chapter 20, but 
for completeness we note that the solution is 


u(x, t) = Uo 



where erf(.x) is the error function discussed in the Appendix. (The more complete sets of 
mathematical tables list this inverse Laplace transform.) 

In the present problem, however, an alternative method is available. Let w(t) be the 
amount of salt that has diffused into the tube in time f; then 


w(t) = / u(x,t)dx, 

Jo 

and its transform is given by 

/ OO /»00 

dt exp(— st) / u(x,t)dx 


/»00 /*CO 

= dx I 

Jo Jo 

/»00 

= / u(x, s) dx. 

Jo 


u(x, t) exp(— st) dt 
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Substituting for u(x,s) from (19.75) into the last integral and integrating, we obtain 

w(s) = uqk 1/2 s~ 3/2 . 

This expression is much simpler to invert, and referring to the table of standard Laplace 
transforms (table 13.1) we find 

w(f) = 2(K/n) l ^ 2 uot 1 ^ 2 , 

which is thus the required expression for the amount of diffused salt at time f. ◄ 

The above example shows that in some circumstances the use of a Laplace 
transformation can greatly simplify the solution of a PDE. However, it will have 
been observed that (as with ODEs) the easy elimination of some derivatives is 
usually paid for by the introduction of a difficult inverse transformation. This 
problem, although still present, is less severe for Fourier transformations. 


infinite metal bar has an initial temperature distribution f(x ) along its length. Find 
the temperature distribution at a later time t. 


We are interested in values of x from — oo to oo, which suggests Fourier transformation 
with respect to x. Assuming that the solution obeys the boundary conditions u(x,t) — > 0 
and du/dx — > 0 as |x| — > oo, we may Fourier-transform the one-dimensional diffusion 
equation (19.74) to obtain 


K 


y/2n 



^f’O expM kx)dx 
ox 2 


1 8 

Jin 3t 



u(x, t) exp (—ikx)dx. 


where on the RHS we have taken the partial derivative with respect to t outside the 
integral. Denoting the Fourier transform of u(x,t) by u(k,t), and using equation (13.28) to 
rewrite the Fourier transform of the second derivative on the LHS, we then have 


— Kk 2 u(k , f) 


8u(k, t) 
fit 


This first-order equation has the simple solution 

u(k,t) = u(k,0)exp(— xL 2 f), 


where the initial conditions give 


1 

u(k, 0) = / u(x,0)exp(—ikx)dx 

2n J - oo 


sfht . 

— r 

\J 2 7T J — □ 


/ (x) exp(— j'fcx) dx = f(k). 


Thus we may write the Fourier transform of the solution as 

u(k,t) = J(k)exp(—Kk 2 t) = y/2nf(k)G(k,t), 


(19.76) 


where we have defined the function G(k,t) = {^[2n)~ l exp(— Kk 2 t). Since u(k,t ) can be 
written as the product of two Fourier transforms, we can use the convolution theorem, 
subsection 13.1.7, to write the solution as 

/ OO 

G(x — x , t)f{x')dx, 

-OO 
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where G(x,t) is the Green’s function for this problem (see subsection 15.2.5). This function 
is the inverse Fourier transform of G(k, t) and is thus given by 


1 r 

G ' XJ, --,h 


exp(— ic/rf) exp( ikx) dk 


1 r 
Tnh 6XP 


—Kt I k 2 — —k 

Kt 


dk. 


Completing the square in the integrand we find 



where in the second line we have made the substitution k' = k — ix/(2Kt), and in the last 
line we have used the standard result for the integral of a Gaussian, given in subsection 
6.4.2. (Strictly speaking the change of variable from k to k' shifts the path of integration 
off the real axis, since k' is complex for real k , and so results in a complex integral, as will 
be discussed in chapter 20. Nevertheless, in this case the path of integration can be shifted 
back to the real axis without affecting the value of the integral.) 

Thus the temperature in the bar at a later time t is given by 


u(x, t) 


/*CO 

' {x-x'f 

/ exp 

J — 00 

4 Kt 


f(x')dx. 


(19.77) 


which may be evaluated (numerically if necessary) when the form of f(x) is given. ◄ 


As we might expect from our discussion of Green’s functions in chapter 15, 
we see from (19.77) that, if the initial temperature distribution is f(x) = S(x — a), 
i.e. a ‘point’ source at x = a , then the temperature distribution at later times is 
simply given by 

u(x, t ) = G(x — a, t) = —j= exp 
yJ4nKt 

The temperature at several later times is illustrated in figure 19.10, which shows 
that the heat diffuses out from its initial position; the width of the Gaussian 
increases as yft, a dependence on time which is characteristic of diffusion processes. 

The reader may have noticed that in both examples using integral transforms 
the solutions have been obtained in closed form - albeit in one case in the form 
of an integral. This differs from the infinite series solutions usually obtained via 
the separation of variables. It should be noted that this behaviour is a result of 
the infinite range in x rather than of the transform method itself. In fact the 
method of separation of variables would yield the same solutions, since in the 
infinite-range case the separation constant is not restricted to take on an infinite 
set of discrete values but may have any real value, with the result that the sum 
over X becomes an integral, as mentioned at the end of section 19.2. 


(x — a ) 
4 Kt 
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u 

Ui 




/ = \ \Y 3 


x = a x 

Figure 19.10 Diffusion of heat from a point source in a metal bar: the curves 
show the temperature u at position x for various different times ti < t 2 < h. 

The area under the curves remains constant, since the total heat energy is 
conserved. 

►An infinite metal bar has an initial temperature distribution f(x) along its length. Find 
the temperature distribution at a later time t using the method of separation of variables. 


This is the same problem as in the previous example, but we now seek a solution by 
separating variables. From (19.12) a separated solution for the one-dimensional diffusion 
equation is 

u(x, t) = [A exp(iXx) + B exp(— iXx)] exp(— xX 2 t), 

where —X 2 is the separation constant. Since the bar is infinite we do not require the 
solution to take a given form at any finite value of x (for instance at x = 0) and so there 
is no restriction on X other than its being real. Therefore instead of the superposition of 
such solutions in the form of a sum over allowed values of X we have an integral over 
all X, 

1 f 00 

u(x, t) = / A(X)exp(—xX 2 t)exp(iXx)dX, (19.78) 

yj 271 J —oo 

where in taking X from — oo to oo we need include only one of the complex exponentials; 
we have taken a factor \/^J2r out of A(X) for convenience. We can see from (19.78) 
that the expression for u(x, t ) has the form of an inverse Fourier transform (where X is 
the transform variable). Therefore, Fourier-transforming both sides and using the Fourier 
inversion theorem, we find 

u(X, t) = A(X)exp(—xX 2 t). 

Now the initial boundary condition requires 

1 

w(x,0)=— -j= / A{X) exp(iXx) dX = f (x), 

yj'T.Tl J —oo 
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from which, using the Fourier inversion theorem once more, we see that A(X) = /(a). 
Therefore we have 

u(l, t ) = /(2)exp(— KA 2 f), 

which is identical to (19.76) in the previous example (but with k replaced by 2), and hence 
leads to the same result. ◄ 


19.5 Inhomogeneous problems - Green’s functions 

In chapters 15 and 17 we encountered Green’s functions and found them a useful 
tool for solving inhomogeneous linear ODEs. We now discuss their usefulness in 
solving inhomogeneous linear PDEs. 

For the sake of brevity we shall again denote a linear PDE by 

Cu(T) = p( r), (19.79) 

where £ is a linear partial differential operator. For example, in Laplace’s equation 
we have C = V 2 , whereas for Flelmholtz’s equation C = V 2 +k 2 . Note that we have 
not specified the dimensionality of the problem, and (19.79) may, for example, 
represent Poisson’s equation in two or three (or more) dimensions. The reader 
will also notice that for the sake of simplicity we have not included any time 
dependence in (19.79). Nevertheless, the following discussion can be generalised 
to include it. 

As we discussed in subsection 18.3.2, a problem is inhomogeneous if the fact 
that u(r) is a solution does not imply that any constant multiple Xu( r) is also a 
solution. This inhomogeneity may derive from either the PDE itself or from the 
boundary conditions imposed on the solution. 

In our discussion of Green’s function solutions of inhomogeneous ODEs (see 
subsection 15.2.5) we dealt with inhomogeneous boundary conditions by making a 
suitable change of variable such that in the new variable the boundary conditions 
were homogeneous. In an analogous way, as illustrated in the final example 
of section 19.2, it is usually possible to make a change of variables in PDEs to 
transform between inhomogeneity of the boundary conditions and inhomogeneity 
of the equation. Therefore let us assume for the moment that the boundary 
conditions imposed on the solution u( r) of (19.79) are homogeneous. This most 
commonly means that if we seek a solution to (19.79) in some region V then 
on the surface S that bounds V the solution obeys the conditions «(r) = 0 or 
du/dn = 0, where du/dn is the normal derivative of u at the surface S. 

We shall discuss the extension of the Green’s function method to the direct so- 
lution of problems with inhomogeneous boundary conditions in subsection 19.5.2, 
but we first highlight how the Green’s function approach to solving ODEs can 
be simply extended to PDEs for homogeneous boundary conditions. 
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19.5.1 Similarities with Green’s functions for ODEs 

As in the discussion of ODEs in chapter 15, we may consider the Green’s 
function for a system described by a PDE as the response of the system to a ‘unit 
impulse’ or ‘point source’. Thus if we seek a solution to (19.79) that satisfies some 
homogeneous boundary conditions on u( r) then the Green’s function G(r, ro) for 
the problem is a solution of 


£G(r,r 0 ) = <5(r-r 0 ), 


(19.80) 


where ro lies in V. The Green’s function G(r,ro) must also satisfy the imposed 
(homogeneous) boundary conditions. 

It is understood that in (19.80) the C operator expresses differentiation with 
respect to r as opposed to ro. Also, b(r — ro) is the Dirac delta function (see 
chapter 13) of dimension appropriate for the problem; it may be thought of as 
representing a unit-strength point source at r = ro- 

Following an analogous argument to that given in subsection 15.2.5 for ODEs, 
if the boundary conditions on «(r) are homogeneous then a solution to (19.79) 
that satisfies the imposed boundary conditions is given by 

«(r) = j G(r,r 0 )p(r 0 )dV(r 0 ), (19.81) 

where the integral on ro is over some appropriate ‘volume’. In two or more 
dimensions, however, the task of finding directly a solution to (19.80) that satisfies 
the imposed boundary conditions on S can be a difficult one, and we return to 
this in the next subsection. 

An alternative approach is to follow a similar argument to that presented in 
chapter 17 for ODEs and so to construct the Green’s function for (19.79) as a 
superposition of eigenfunctions of the operator C, provided C is Hermitian. By 
analogy with an ordinary differential operator, a partial differential operator is 
Hermitian if it satisfies 


v‘(r)£w(r)dV = 


w*(r)£v(r)dV 


where the asterisk denotes complex conjugation and v and w are arbitrary func- 
tions obeying the imposed (homogeneous) boundary condition on the solution of 
£u{ r) = 0. 

The eigenfunctions w„(r), n = 0, 1,2, ..., of £ satisfy 


£u„{r) = A„u„(r), 


where A n are the corresponding eigenvalues, which are all real for an Hermitian 
operator £. Furthermore, each eigenfunction must obey any imposed (homo- 
geneous) boundary conditions. Using an argument analogous to that given in 
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chapter 17, the Green’s function for the problem is given by 

G(r , ro) = £?A>« (19.82) 

A ^ n 

n = 0 

From (19.82) we see immediately that the Green’s function (irrespective of how 
it is found) enjoys the property 

G(r,r 0 ) = G*(r 0 ,r). 

Thus, if the Green’s function is real then it is symmetric in its two arguments. 

Once the Green’s function has been obtained, the solution to (19.79) is again 
given by (19.81). For PDEs this approach can become very cumbersome, however, 
and so we shall not pursue it further here. 


19.5.2 General boundary-value problems 

As mentioned above, often inhomogeneous boundary conditions can be dealt 
with by making an appropriate change of variables, such that the boundary 
conditions in the new variables are homogeneous although the equation itself is 
generally inhomogeneous. In this section, however, we extend the use of Green’s 
functions to problems with inhomogeneous boundary conditions (and equations). 
This provides a more consistent and intuitive approach to the solution of such 
boundary-value problems. 

For definiteness we shall consider Poisson’s equation 

V 2 u(r) = p( r), (19.83) 

but the material of this section may be extended to other linear PDEs of the form 
(19.79). Clearly, Poisson’s equation reduces to Laplace’s equation for p(r) = 0 and 
so our discussion is equally applicable to this case. 

We wish to solve (19.83) in some region V bounded by a surface S, which may 
consist of several disconnected parts. As stated above, we shall allow the possibility 
that the boundary conditions on the solution u( r) may be inhomogeneous on S, 
although as we shall see this method reduces to those discussed above in the 
special case that the boundary conditions are in fact homogeneous. 

The two common types of inhomogeneous boundary condition for Poisson’s 
equation are (as discussed in subsection 18.6.2): 

(i) Dirichlet conditions, in which u( r) is specified on S, and 

(ii) Neumann conditions, in which du/dn is specified on S. 

In general, specifying both Dirichlet and Neumann conditions on S overdetermines 
the problem and leads to there being no solution. 

The specification of the surface S requires some further comment, since S 
may have several disconnected parts. If we wish to solve Poisson’s equation 
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(a) 

Figure 19.11 Surfaces used for 
regions V. 



inside some closed surface S then the situation is straightforward and is shown 
in figure 19.11(a). If, however, we wish to solve Poisson’s equation in the gap 
between two closed surfaces (for example in the gap between two concentric 
conducting cylinders) then the volume V is bounded by a surface S that has two 
disconnected parts Si and S 2 , as shown in figure 19.11(h); the direction of the 
normal to the surface is always taken as pointing out of the volume V. A similar 
situation arises when we wish to solve Poisson’s equation outside some closed 
surface Si. In this case the volume V is infinite but is treated formally by taking 
the surface S 2 as a large sphere of radius R and letting R tend to infinity. 

In order to solve (19.83) subject to either Dirichlet or Neumann boundary 
conditions on S, we first remind ourselves of Green’s second theorem, equation 
(11.20), which states that for two scalar functions (/>( r) and ip(r) defined in some 
volume V bounded by a surface S 

/ (0V 2 yi - v’V 2 0)dU = [ (QVip - xpVcj)) ■ hdS, (19.84) 

Jv Js 

where on the RHS it is common to write, for example, Vi p • n dS as (8xp/dn)dS. 
The expression dxp/dn stands for Vi p ■ n, the rate of change of 1 p in the direction 
of the unit outward normal n to the surface S. 

The Green’s function for Poisson’s equation (19.83) must satisfy 


V 2 G(r, r 0 ) = <5(r — r 0 ), (19.85) 

where ro lies in V. (As mentioned above, we may think of G(r, ro) as the solution 
to Poisson’s equation for a unit-strength point source located at r = ro.) Let us 
for the moment impose no boundary conditions on G(r, ro). 

If we now let (j> = w(r) and xp = G(r, r 0 ) in Green’s theorem (19.84) then we 
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obtain 


[w(r)V 2 G(r, r 0 ) - G(r,r 0 ) V 2 t/(r)] dV(r) 


' 8G( r,r 0 ) 8u( r) 

«(r) — r G(r, ro) — — 

on on 


dS{ r), 


where we have made explicit that the volume and surface integrals are with 
respect to r. Using (19.83) and (19.85) the LHS can be simplified to give 


Wr)<5(r - r 0 ) - G(r,r 0 )p(r)] dV( r) 


' dG(r,r 0 ) du(r) 

w(r) — G(r,r 0 )^— 

on on 


dS(r), (19.86) 


Since r 0 lies within the volume V, 


u (r)<5(r — r 0 )dV{r) = u(r 0 ), 


and thus rearranging (19.86) the solution to Poisson’s equation (19.83) can be 
written 


w(r 0 ) = [ G(r,r 0 )p(r)dF(r) 
Jv 



8G( r, r 0 ) 
8n 


G(r, r 0 ) 


difir) 

8n 


dS( r). 

(19.87) 


Clearly, we can interchange the roles of r and ro in (19.87) if we wish. (Remember 
also that, for a real Green’s function, G(r,ro) = G(ro,r).) 

Equation (19.87) is central to the extension of the Green’s function method 
to problems with inhomogeneous boundary conditions, and we next discuss its 
application to both Dirichlet and Neumann boundary-value problems. But, before 
doing so, we also note that if the boundary condition on S is in fact homogeneous, 
so that u(r) = 0 or du(r)/dn = 0 on S, then demanding that the Green’s function 
G(r, ro) also obeys the same boundary condition causes the surface integral in 
(19.87) to vanish, and we are left with the familiar form of solution given in 
(19.81). The extension of (19.87) to a PDE other than Poisson’s equation is 
discussed in exercise 19.30. 


19.5.3 Dirichlet problems 

In a Dirichlet problem we require the solution u( r) of Poisson’s equation (19.83) 
to take specific values on some surface S that bounds F, i.e. we require that 
u( r) = / (r) on S where / is a given function. 

If we seek a Green’s function G(r, r 0 ) for this problem it must clearly satisfy 
(19.85), but we are free to choose the boundary conditions satisfied by G(r, ro) in 
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such a way as to make the solution (19.87) as simple as possible. From (19.87), 
we see that by choosing 

G(r, r 0 ) = 0 for r on S (19.88) 

the second term in the surface integral vanishes. Since w(r) = /( r) on S, (19.87) 
then becomes 

«(r 0 ) = J G( r, ro)p(r) dV{ r) + J dS( r). (19.89) 

Thus we wish to find the Dirichlet Green’s function that 

(i) satisfies (19.85) and hence is singular at r = ro, and 

(ii) obeys the boundary condition G(r,r 0 ) = 0 for r on S. 

In general, it is, difficult to obtain this function directly, and so it is useful to 
separate these two requirements. We therefore look for a solution of the form 

G(r,r 0 ) = F(r,r 0 ) + H(r,r 0 ), 

where F(r,ro) satisfies (19.85) and has the required singular character at r = ro but 
does not necessarily obey the boundary condition on S, whilst H( r,r 0 ) satisfies 
the corresponding homogeneous equation (i.e. Laplace’s equation) inside V but 
is adjusted in such a way that the sum G(r,ro) equals zero on S. The Green’s 
function G(r,ro) is still a solution of (19.85) since 

V 2 G(r, r 0 ) = V 2 F(r,r 0 ) + V 2 tf(r,r 0 ) = V 2 F(r,r 0 ) + 0 = <5(r — r 0 ). 

The function F(r,r 0 ) is called the fundamental solution and will clearly take 
different forms depending on the dimensionality of the problem. Let us first 
consider the fundamental solution to (19.85) in three dimensions. 


► Find the fundamental solution to Poisson’s equation in three dimensions that tends to zero 
as |r| — * oo. 


We wish to solve 


V 2 F(r,r 0 ) = <5(r — r 0 ) 


(19.90) 


in three dimensions, subject to the boundary condition F(r,ro) — > 0 as |r| — > oo. Since the 
problem is spherically symmetric about r 0 , let us consider a large sphere S of radius R 
centred on r 0 , and integrate (19.90) over the enclosed volume V. We then obtain 


L 


L 


V 2 F(r,r 0 )rfL = / <5(r - r 0 ) dV = 1, 


since V encloses the point r 0 . However, using the divergence theorem, 

V 2 F(r,ro)JL = J VF(r,ro) ■ hdS, 
where ii is the unit normal to the large sphere S at any point. 


L 


(19.91) 


(19.92) 
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Since the problem is spherically symmetric about ro, we expect that 

F(r,r 0 ) = F(|r-r 0 |) = F(r), 

i.e. F has the same value everywhere on S. Thus, evaluating the surface integral in (19.92) 
and equating it to unity from (19.91), we havef 


Integrating this expression we obtain 

F(r) = — — ^ - — |- constant, 

4nr 

but, since we require F(r,r 0 ) — > 0 as |r| — > oo, the constant must be zero. The fundamental 
solution in three dimensions is consequently given by 

F,rr * 1 — (19 ' ,3) 

This is clearly also the full Green's function for Poisson's equation subject to the boundary 
condition n(r) — > 0 as |r| — > oo. ◄ 

Using (19.93) we can write down the solution of Poisson’s equation to find, 
for example, the electrostatic potential u(r) due to some distribution of electric 
charge p(r). The electrostatic potential satisfies 

V 2 «(r) = - A 
<?o 

where u(r) — > 0 as |r| — > oo. Since the boundary condition on the surface at 
infinity is homogeneous the surface integral in (19.89) vanishes, and using (19.93) 
we recover the familiar solution 

u(r 0 ) = / a — T dV ( r )’ ( 19 - 94 > 

J 47re 0 |r-r 0 | 

where the volume integral is over all space. 

We can develop an analogous theory in two dimensions. As before the funda- 
mental solution satisfies 



V 2 F(r, r 0 ) = <5(r — r 0 ), (19.95) 

where <5(r — ro ) is now the two-dimensional delta function. Following an analogous 
method to that used in the previous example, we find the fundamental solution 
in two dimensions to be given by 

F( r, r 0 ) = - 1 -ln|r-ro| + constant. (19.96) 

in 


f A vertical bar to the right of an expression is a common alternative notation to enclosing the 
expression in square brackets; as usual, the subscript shows the value of the variable at which the 
expression is to be evaluated. 
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From the form of the solution we see that in two dimensions we cannot apply 
the condition F(r,ro) — *• 0 as |r| — » oo, and in this case the constant does not 
necessarily vanish. 

We now return to the task of constructing the full Dirichlet Green’s function. To 
do so we wish to add to the fundamental solution a solution of the homogeneous 
equation (in this case Laplace’s equation) such that G(r, r 0 ) = 0 on S, as required 
by (19.89) and its attendant conditions. The appropriate Green’s function is 
constructed by adding to the fundamental solution ‘copies’ of itself that represent 
‘image’ sources at different locations outside V. Hence this approach is called the 
method of images. 

In summary, if we wish to solve Poisson’s equation in some region V subject to 
Dirichlet boundary conditions on its surface S then the procedure and argument 
are as follows. 

(i) To the single source <5(r — ro) inside V add image sources outside V 

N 

q n 8( r — r„) with r„ outside V, 

n= 1 

where the positions r„ and the strengths q„ of the image sources are to be 
determined as described in step (iii) below. 

(ii) Since all the image sources lie outside V, the fundamental solution cor- 
responding to each source satisfies Laplace’s equation inside V. Thus we 
may add the fundamental solutions F(r, r„) corresponding to each image 
source to that corresponding to the single source inside V, obtaining the 
Green’s function 

N 

G(r,r 0 ) = F( r,r 0 ) + y^q„F(r,r„). 

n= 1 

(iii) Now adjust the positions r„ and strengths q„ of the image sources so 
that the required boundary conditions are satisfied on S. For a Dirichlet 
Green’s function we require G(r, ro) = 0 for r on S. 

(iv) The solution to Poisson’s equation subject to the Dirichlet boundary 
condition u( r) = /(r) on S is then given by (19.89). 

In general it is very difficult to find the correct positions and strengths for the 
images, i.e. to make them such that the boundary conditions on S are satisfied. 
Nevertheless, it is possible to do so for certain problems that have simple geometry. 
In particular, for problems in which the boundary S consists of straight lines (in 
two dimensions) or planes (in three dimensions), positions of the image points 
can be deduced simply by imagining the boundary lines or planes to be mirrors 
in which the single source in V (at ro) is reflected. 
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Figure 19.12 The arrangement of images for solving Laplace's equation in 
the half-space z > 0. 


► So/re Laplace’s equation V 2 u = 0 in three dimensions in the half-space z > 0, given that 
u( r) = /( r) on the plane z = 0. 


The surface S bounding V consists of the xy-plane and the surface at infinity. Therefore, 
the Dirichlet Green’s function for this problem must satisfy G(r,ro) = 0 on z = 0 and 
G(r, r 0 ) — > 0 as |r| — > oo. Thus it is clear in this case that we require one image source at a 
position ri that is the reflection of ro in the plane z = 0, as shown in figure 19.12 (so that 
r! lies in z < 0, outside the region in which we wish to obtain a solution). It is also clear 
that the strength of this image should be —1. 

Therefore by adding the fundamental solutions corresponding to the original source 
and its image we obtain the Green’s function 


G(r,r 0 ) = - 


1 


1 


47i|r-r 0 | 47t | r — ri| ’ 


(19.97) 


where r t is the reflection of ro in the plane z = 0, i.e. if ro = (xo,yo,zo) then ri = (xo,.vo, — zo)- 
Clearly G(r,r 0 ) — > 0 as |r| — > oo as required. Also G(r,r 0 ) = 0 on z = 0, and so (19.97) is 
the desired Dirichlet Green’s function. 

The solution to Laplace's equation is then given by (19.89) with p(r) = 0, 

u(r°) = / /(r) l?G ^’ r ° ) dS(r). (19.98) 

Clearly the surface at infinity makes no contribution to this integral. The outward-pointing 
unit vector normal to the xy-plane is simply n = — k (where k is the unit vector in the 
z-direction), and so 


dG(r,r 0 ) 

8n 


dG(r,r 0 ) 

8z 


— k ■ VG(r, r 0 ). 


We may evaluate this normal derivative by writing the Green’s function (19.97) explicitly 
in terms of x, y and z (and x 0 , yo and z 0 ) and calculating the partial derivative with respect 
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to z directly. It is usually quicker, however, to use the fact thatf 


V|r — r 0 | 


r-r 0 

l r — r ol ’ 


(19.99) 


thus 


VG(r,r 0 ) 


r-r 0 

47r|r — r 0 | 3 


r-fi 

4n\t — ri | 3 ’ 


Since ro = (xo,yo,zo) and ri = (xo,yo, — zo) the normal derivative is given by 


dG(r,r 0 ) 

8z 


= -k- VG(r,r 0 ) 


Z — Zo Z + Zo 

47r|r — ro| 3 47t|r — ri| 3 ’ 


Therefore on the surface z = 0, writing out the dependence on x, y and z explicitly, we 
have 


<3G(r, r 0 ) 


dz 


2 z 0 


47t [(x - x 0 ) 2 + (y - y 0 ) 2 + ’o] 3/2 ’ 


Inserting this expression into (19.98) we obtain the solution 


u(x 0 ,y 0 ,z 0 ) = 2i 
2n 


J —00 J —o 


f(x,y ) 


[(X - x 0 ) 2 + (y - .vo) 2 + ZoP /2 


dxdy. < 


An analogous procedure may be applied in two-dimensional problems. For 
example, in solving Poisson’s equation in two dimensions in the half-space x > 0 
we again require just one image charge, of strength q\ = —1, at a position ri that 
is the reflection of ro in the line x — 0. Since we require G(r,ro) = 0 when r lies 
on x = 0, the constant in (19.96) must equal zero, and so the Dirichlet Green’s 
function is 


G(r, r 0 ) = (ln|r-r 0 | — ln|r — r t |) . 

Clearly G(r,ro) tends to zero as |r| — > oo. If, however, we wish to solve the two- 
dimensional Poisson equation in the quarter space x > 0, y > 0, then more image 
points are required. 


f Since |r — ro | 2 = (r — ro) • (r — ro) we have V|r — roj 2 = 2(r — ro), from which we obtain 


V(|r- 




■ r 0 


2 (|r — r 0 | 2 ) 1/2 l r_ r ol 


Note that this result holds in two and three dimensions. 
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—X 


y 




X 


x~x 


• r 2 


-X 


•r i 


Figure 19.13 The arrangement of images for finding the force on a line 
charge situated in the (two-dimensional) quarter-space x > 0, y > 0, when the 
planes x = 0 and y = 0 are earthed. 


►A line charge in the z-direction of charge density X is placed at some position ro in the 
quarter-space x > 0, y > 0. Calculate the force per unit length on the line charge due to 
the presence of thin earthed plates along x = 0 and y = 0. 


Here we wish to solve Poisson's equation 

V 2 m = — — <5(r — ro) 
eo 

in the quarter space x > 0, y > 0. It is clear that we require three image line charges 
with positions and strengths as shown in figure 19.13 (all of which lie outside the region 
in which we seek a solution). The boundary condition that the electrostatic potential u is 
zero on .x = 0 and y = 0 (shown as the ‘curve’ C in figure 19.13) is then automatically 
satisfied, and so this system of image charges is directly equivalent to the original situation 
of a single line charge in the presence of the earthed plates along x = 0 and y = 0. Thus 
the electrostatic potential is simply equal to the Dirichlet Green’s function 

«(r) = G(r,r 0 ) = (In |r — r 0 | — In |r — rr| + In |r — r 2 | — In |r — r 3 |) , 

2neo 

which equals zero on C and on the ‘surface’ at infinity. 

The force on the line charge at r 0 , therefore, is simply that due to the three line charges 
at ri, r? and ri. The elecrostatic potential due to a line charge at r,, i = 1,2 or 3, is given 
by the fundamental solution 

_ X 

»i( r) = +- In |r — r,-| + c, 

2ne 0 


the upper or lower sign being taken according to whether the line charge is positive or 
negative respectively. Therefore the force per unit length on the line charge at ro, due to 
the one at r„ is given by 


—XVuj( r) 


r=ro 


= + 


X 2 r 0 - r,- 
2ne 0 |r 0 — r,-| 2 ’ 
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Adding the contributions from the three image charges shown in figure 19.13, the total 
force experienced by the line charge at r 0 is 

. 7 ( r 0 — r t r 0 -r 2 _ r 0 - r 3 \ 

27ie 0 V l r o — i'll 2 ko — r 2 | 2 ko — r 3 1 2 y ’ 


where, from the figure, r 0 — n = 2y 0 j, r 0 — r 2 = 2x 0 i + 2_v 0 j and r 0 — r 3 = 2xoi. Thus, in 
terms of xo and yo, the total force on the line charge due to the charge induced on the 
plates is given by 


F 


2ne 0 \ 2y 0 ■* + 


4 te<?o (*o + To) 


2x 0 i + 2y 0 j 1_ 

4x§ + 4 yl 2.x 0 7 



Further generalisations are possible. For instance, solving Poisson’s equation in 
the two-dimensional strip — oo <x<oo, 0 < y < b requires an infinite series of 
image points. 

So far we have considered problems in which the boundary S consists of 
straight lines (in two dimensions) or planes (in three dimensions), in which simple 
reflection of the source at ro in these boundaries fixes the positions of the image 
points. For more complicated (curved) boundaries this is no longer possible, and 
finding the appropriate position! s) and strength(s) of the image source(s) requires 
further work. 


► Use the method of images to find the Dirichlet Green’s function for solving Poisson’s 
equation outside a sphere of radius a centred at the origin. 


We need to find a solution of Poisson’s equation valid outside the sphere of radius a. 
Since an image point iq cannot lie in this region, it must be located within the sphere. The 
Green's function for this problem is therefore 


47i|r — ro | 47t|r — r i|’ 

where kol > a, |nl < a and q is the strength of the image which we have yet to determine. 
Clearly, G(r,r 0 ) — > 0 on the surface at infinity. 

By symmetry we expect the image point ri to lie on the same radial line as the original 
source, r 0 , as shown in figure 19.14, and so r! = kr 0 where k < 1. However, for a Dirichlet 
Green's function we require G(r — ro) = 0 on |r| = «, and the form of the Green’s function 
suggests that we need 

|r — r 0 | oc |r — r il for all |r| = a. (19.100) 


Referring to figure 19.14, if this relationship is to hold over the whole surface of the 
sphere, then it must certainly hold for the points A and B. We thus require 

|ro| — a _ kol + a 

a — |n| a + Ini’ 

which reduces to |nl = a 2 /kol- Therefore the image point must be located at the position 
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Figure 19.14 The arrangement of images for solving Poisson's equation 
outside a sphere of radius a centred at the origin. For a charge +1 at r 0 , the 
image point ri is given by (a/|r 0 |) 2 ro and the strength of the image charge is 
— a/|r 0 |. 


It may now be checked that, for this location of the image point, (19.100) is satisfied 
over the whole sphere. Using the geometrical result 


i 2 , 2 2<r a 4 


|r-ri| = |r| 2 — — r ■ r 0 + TrT5 - 

„2 


Irol 2 Irol 2 


= j ^|2 (l r o| 2 — 2r ■ r 0 + a 2 ) 


for |r| = a. 


we see that, on the surface of the sphere, 

l r - r ,l-ifj |r - r .l 


for |r| = a. 


(19.101) 


(19.102) 


Therefore, in order that G = 0 at |r| = a, the strength of the image charge must be 
-«/|ro|. Consequently, the Dirichlet Green's function for the exterior of the sphere is 


G(r,r 0 ) = -- 


1 


+ 


a/|r 0 | 


47i|r — r 0 | 47r |r — (a 2 /|r 0 | 2 )r 0 | ‘ 

For a less formal treatment of the same problem see exercise 19.24. ◄ 


If we seek solutions to Poisson’s equation in the interior of a sphere then the 
above analysis still holds, but r and r 0 are now inside the sphere and the image 
ri lies outside it. 

For two-dimensional Dirichlet problems outside the circle |r| = a, we are led 
by arguments similar to those employed previously to use the same image point 
as in the three-dimensional case, namely 

ri = ^r 0 . (19.103) 

Irol 2 
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As illustrated below, however, it is usually necessary to take the image strength 
as —1 in two-dimensional problems. 


► So/ue Laplace’s equation in the two-dimensional region |r| < a, subject to the boundary 
condition u = on |r| = a. 


In this case we wish to find the Dirichlet Green’s function in the interior of a disc of 
radius a, so the image charge must lie outside the disc. Taking the strength of the image 
to be —1, we have 

G(r,r 0 ) = In |r — r 0 | - -*-ln|r-ri| + c, 

Zn Zn 

where ri = (a 2 /|ro| 2 )ro lies outside the disc, and c is a constant that includes the strength 
of the image charge and does not necessarily equal zero. 

Since we require G(r,ro) = 0 when |r| = a, the value of the constant c is determined, 
and the Dirichlet Green’s function for this problem is given by 


G(r,r 0 ) = 2 “ ( ln| r -r 0 | - In 


— ^ro -In — 
M 2 a 


(19.104) 


Using plane polar coordinates, the solution to the boundary-value problem can be written 
as a line integral around the circle p = a: 


u(ro)= / f(r) ( G ^ r<)) dl 
8G( r,r 0 ) 


L 


m 


dp 


a dtj>. 


(19.105) 


The normal derivative of the Green’s function (19.104) is given by 
5G(r,r 0 ) r 


dp 




■ VG(r, r 0 ) 


•r 0 


2n\r\ Vlr-Dl 2 |r-r t | 2 


(19.106) 


Using the fact that r! = (a 2 /|r 0 | 2 )r 0 and the geometrical result (19.102), we find that 


dG(r,rp) 

dp 


a 2 ~ M 2 

2na\r — rp| 2 ’ 


In plane polar coordinates, r = p cos 4> i + p sin (j> j and r 0 = po cos <f> 0 i + p 0 sin </> 0 j, and 


<3G(r,r 0 ) 


dp 




a 2 -Po 


\ 2na J a 1 + p 2 a — 2apo cos( <j) — 4> o) ' 


On substituting into (19.105), we obtain 


u(p 0 , <t>o) = 7T- 


(a 2 — pl)f(<t>)d<t> 


2n J 0 a 2 + pi — 2ap 0 cos( <j> ■ 
which is the solution to the problem. ◄ 


(19.107) 
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19.5.4 Neumann problems 

In a Neumann problem we require the normal derivative of the solution of 
Poisson’s equation to take on specific values on some surface S that bounds V, 
i.e. we require 8u(r)/8n = /(r) on S, where / is a given function. As we shall see, 
much of our discussion of Dirichlet problems can be immediately taken over into 
the solution of Neumann problems. 

As we proved in section 18.7 of the previous chapter, specifying Neumann 
boundary conditions determines the relevant solution of Poisson’s equation to 
within an (unimportant) additive constant. Unlike Dirichlet conditions, Neumann 
conditions impose a self-consistency requirement. In order for a solution u to exist, 
it is necessary that the following consistency condition holds: 

[ f dS = I Vu ■ n dS = / V 2 udV = / pdV, (19.108) 

Js Js Jv Jv 

where we have used the divergence theorem to convert the surface integral into 
a volume integral. As a physical example, the integral of the normal component 
of an electric field over a surface bounding a given volume cannot be chosen 
arbitrarily when the charge inside the volume has already been specified (Gauss’s 
theorem). 

Let us again consider (19.87), which is central to our discussion of Green’s 
functions in inhomogeneous problems. It reads 

«(r 0 ) = [ G(r,r 0 )p{r)dV(r)+ [ u( r) ^ r ' ro> - G(r, r 0 )^-^ dS( r). 

Jv Js l on on 

As always, the Green’s function must obey 

V 2 G(r,r 0 ) = <5(r — r 0 ), 


where ro lies in V. In the solution of Dirichlet problems in the previous subsection, 
we chose the Green’s function to obey the boundary condition G(r,ro) = 0 on S 
and, in a similar way, we might wish to choose <3G(r, rf}/6n = 0 in the solution of 
Neumann problems. However, in general this is not permitted since the Green’s 
function must obey the consistency condition 

[ d G ( r ’ r °) ds = f VG(r, r 0 ) • ndS = [ V 2 G(r,r 0 )hL = 1. 

Js Sn J s J v 


The simplest permitted boundary condition is therefore 


dG(r,r 0 ) = 1 
8n A 


for r on S, 


where A is the area of the surface S; this defines a Neumann Green’s function. 

If we require 8u(r)/8n = /( r) on S, the solution to Poisson’s equation is given 
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by 

«(r 0 )= f G(r, r 0 )p(r) dV (r) + f u(r)dS(r)~ f G(r,r 0 )/(r)dS(r) 

J V J S J s 

= f G(r,r 0 )p(r)dV(r) + {u(r))s- f G(r,r 0 )/(r)dS(r), (19.109) 

Jv Js 

where (u( r))s is the average of u over the surface S and is a freely specifiable 
constant. For Neumann problems in which the volume V is bounded by a surface 
S at infinity, we do not need the («( r ))s term. For example, if we wish to solve 
a Neumann problem outside the unit sphere centred at the origin then r > a 
is the region V throughout which we require the solution; this region may be 
considered as being bounded by two disconnected surfaces, the surface of the 
sphere and a surface at infinity. By requiring that w(r) — > 0 as |r| — ► oo, the term 
(n(r)) s becomes zero. 

As mentioned above, much of our discussion of Dirichlet problems can be 
taken over into the solution of Neumann problems. In particular, we may use the 
method of images to find the appropriate Neumann Green’s function. 


► Solve Laplace’s equation in the two-dimensional region |r| < a subject to the boundary 
condition du/dn = f(</>) on |r| = a, with f{<l>)d(j) = 0 as required by the consistency 
condition (19.108). 


Let us assume, as in Dirichlet problems with this geometry, that a single image charge is 
placed outside the circle at 


ri = 


a 


2 


l r oi 


To, 


where ro is the position of the source inside the circle (see equation (19.103)). Then, from 
(19.102), we have the useful geometrical result 

|r — ri| = 7 — r|r — r 0 j for |r| = a. (19.110) 

l r ol 


Leaving the strength q of the image as a parameter, the Green's function has the form 

G(r, r 0 ) = ^ (In |r — r 0 | + <3 In |r — ri | + c) . (19.111) 

Using plane polar coordinates, the radial (i.e. normal) derivative of this function is given 
by 


8G( r, rp) 
dp 


|r| 


• VG(r,r 0 ) 


r 

2n\r\ 


r-U q(r-ri) 
|r-r 0 | 2 + |r — rfi 2 


Using (19.110), on the circumference of the circle p = a the radial derivative is 
<3G(r, r 0 ) 1 


dp 


27t|r| 

1 


2na |r — ro| 2 


|r| 2 — r ■ rp q|r| 2 - q(a 2 /|r 0 | 2 )r ■ r 0 
l r — r 0 | 2 (« 2 /|r 0 | 2 )|r — r 0 | 2 

1 [l r l 2 + q|r 0 | 2 — (1 + <?)r • r 0 ] , 
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where we have set |r| 2 = a 2 in the second term on the RHS, but not in the first. If we take 
q = 1, the radial derivative simplifies to 


8G( r, r 0 ) 


dp 


p=a 


l 

2na' 


or 1/L where L is the length of the circumference, and so (19.111) with q = 1 is the 
required Neumann Green’s function. 

Since p(r) = 0, the solution to our boundary-value problem is now given by (19.109) as 
u(r 0 ) = (u(r)) c - j G(r,r 0 )/(r)d/(r), 

where the integral is around the circumference of the circle C. In plane polar coordinates 
r = p cos (j> i + p sin <^> j and r 0 = po cos 4>o i + po sin (f > o j, and again using (19.110) we find 
that on C the Green's function is given by 


G(r ’ ro) U = i 

1 


In |r — r 0 | + In ( — |r-r 0 | ) +c 
l r ol 


2n 

1 

2n 


= — In |r - r 0 |- + In — + c 


l r ol 


| In [a 2 + Po — 2ap 0 cos {cp - <j> 0 )] + In ^- + c| . 


Since dl = a dtj) on C, the solution to the problem is given by 


(19.112) 


u(po,</>o) = {u) c - ^ J f{<t>) ln[a 2 + Po — 2apo — <^o)] dcj). 

The contributions of the final two terms terms in the Green’s function (19.112) vanish 
because f~* f(4>) dtj) = 0. The average value of u around the circumference, ( u) c , is a freely 
specifiable constant as we would expect for a Neumann problem. This result should be 
compared with the result (19.107) for the corresponding Dirichlet problem, but it should 
be remembered that in the one case is a potential, and in the other the gradient of a 
potential. ◄ 


19.1 


19.2 


19.3 


19.6 Exercises 


Solve the following first-order partial differential equations by separating the 
variables : 


8u du 

(a) x— = 0; 

ox oy 


, du ^ du 
(b) x- 2y — = 0. 

ox oy 


A conducting cube has as its six faces the planes x = +a, y = +a and z = +a, 
and contains no internal heat sources. Verify that the temperature distribution 


nx . nz 

u(x, y,z,t) = A cos — sin — exp 
a a 


2KTt 2 t 


obeys the appropriate diffusion equation. Across which faces is there heat flow? 
What is the direction and rate of heat flow at the point (3a/4,a/4, a) at time 

t = fl 2 /(x'7t 2 )? 

The wave equation describing the transverse vibrations of a stretched membrane 
under tension T and having a uniform surface density p is 


/ d 2 u d 2 u\ 8 2 u 

[d* + 8y 2 J =P di 2 ' 
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19.4 


19.5 


19.6 


19.7 


Find a separable solution appropriate to a membrane stretched on a frame of 
length a and width b, showing that the natural angular frequencies of such a 
membrane are 


co 


2 


n 2 T 

P 



where n and m are any positive integers. 

Schrodinger's equation for a non-relativistic particle in a constant potential region 
can be taken as 


hr / 8 2 u d 2 u d 2 u\ 8u 

2m k<3x 2 8y 2 8z 2 ) 8t 


(a) Find a solution, separable in the four independent variables, that can be 
written in the form of a plane wave, 

i p(x,y,z, t) = ,4exp[/(k ■ r — cof)]. 


Using the relationships associated with de Broglie (p = ftk) and Einstein 
(£ = hco), show that the separation constants must be such that 

Pi + Py +P 2 : = 2-mE. 

(b) Obtain a different separable solution describing a particle confined to a box 
of side a (t/> must vanish at the walls of the box). Show that the energy of 
the particle can only take the quantised values 


E = 


h 2 n 2 
2 ma 2 


(n~ + n 2 + n 2 ), 


where n x , n y , n z are integers. 

Denoting the three terms of V 2 in spherical polars by V 2 , V^, in an obvious 
way, evaluate V 2 », etc. for the two functions given below and verify that, in each 
case, although the individual terms are not necessarily zero their sum V 2 u is zero. 
Identify the corresponding values of { and m. 


(a) u{r,d,(f>) 

(b) u(r,d,<f > ) 




, 3 cos 2 0—1 

' 2 

sin 9 exp i<f>. 


Prove that the expression given in equation (19.47) for the associated Legendre 
function Pp(fi) satisfies the appropriate equation, (19.45), as follows. 

(a) Evaluate dP"'(f.i)/dfi and d 2 P"'(^i)/d^i 2 using the forms given in (19.47) and 
substitute them into (19.45). 

(b) Differentiate Legendre’s equation m times using Leibniz’ theorem. 

(c) Show that the equations obtained in (a) and (b) are multiples of each other, 
and hence that the validity of (b) implies that of (a). 

Use the expressions at the end of subsection 19.3.2 to verify for / = 0, 1,2 that 

and so is independent of the values of d and <j>. This is true for any but 
a general proof is more involved. This result helps to reconcile intuition with 
the apparently arbitrary choice of polar axis in a general quantum mechanical 
system. 
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19.8 


19.9 


19.10 


19.11 


19.12 


19.13 


Express the function 

f(8,4 > ) = sin 0 [sin 2 (0/2) cos cp + icos 2 (0/2) sin cp] + sin 2 (0/2) 
as a sum of spherical harmonics. 

Continue the analysis of exercise 10.20, concerned with the flow of a very viscous 
fluid past a sphere, to find the full expression for the stream function xp(r, d). At 
the surface of the sphere r = a the velocity field u = 0, whilst far from the sphere 
xp ~ (Ur 2 sin 2 8)/2. 

Show that f(r) can be expressed as a superposition of powers of r, and 
determine which powers give acceptable solutions. Hence show that 

i p(r,6) = ^2r 2 — 3 ar + — ^ sin 2 8. 

The motion of a very viscous fluid in the two-dimensional (wedge) region —a < 
<j> < a can be described in (p, cp) coordinates by the (biharmonic) equation 

v 2 v 2 i p = vV = o, 


together with the boundary conditions d\p/d(j> = 0 at cp = +a, which represents 
the fact that there is no radial fluid velocity close to either of the bounding 
walls because of the viscosity, and dxp/dp = +p at cp = +a, which imposes the 
condition that azimuthal flow increases linearly with r along any radial line. 
Assuming a solution in separated-variable form, show that the full expression for 
i p is 


v(p,<P) 


p 2 sinlcp — 2cpcos2a 
2 sin 2a — 2a cos 2a ' 


A circular disk of radius a is such a way that its perimeter p = a is maintained 
with a temperature distribution A + B cos 2 cp, where p and cp are plane polar 
coordinates and A and B are constants. Find the temperature T(p,cp ) everywhere 
in the region p < a. 

(a) Find the form of the solution of Laplace's equation in plane polar coordinates 
p, cp that takes the value +1 for 0 < <j> < n and the value —1 for — n < <j> < 0, 
when p = a. 

(b) For a point (x,y) on or inside the circle x 2 + y 2 = a 2 , identify the angles a 
and defined by 

a = tan -1 — - — and /? = tan -1 — — — . 

a + x a — x 

Show that u(x,y) = (2/n)(a + /?) is a solution of Laplace’s equation that 
satisfies the boundary conditions given in (a). 

(c) Deduce a Fourier series expansion for the function 


tan 


sirup 
1 + cos (p 


+ tan -1 


sin (j> 

1 — cos r/j ’ 


The free transverse vibrations of a thick rod satisfy the equation 


4 8 4 u 8 2 u 

a + W 


= o. 


Obtain a solution in separated-variable form and, for a rod clamped at one end, 
.x = 0, and free at the other, x = L, show that the angular frequency of vibration 
co satisfies 


cosh 


<dV 2 L \ 
o J 
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19.6 EXERCISES 


19.14 


19.15 


19.16 


19.17 


19.18 


19.19 


19.20 


(At a clamped end both u and du/dx vanish, whilst at a free end, where there is 
no bending moment, 8 2 u/8x 2 and 8 3 u/8x 3 are both zero.) 

A membrane is stretched between two concentric rings of radii a and b (b > a). 
If the smaller ring is transversely distorted from the planar configuration by an 
amount c\(j>\, —n < 4> < n, show that the membrane then has a shape given by 


u{p,<t> ) 


cn In (b/p) 
2 In (b/a) 



m odd 




cos m<j). 


A string of length L, fixed at its two ends, is plucked at its mid-point by an 
amount A and then released. Prove that the subsequent displacement is given by 


u(x, t) = Y^ 


8 A 


n 2 (2n + l) 2 


(2/1 + 1)7IX 


(2/7 + l)7ICf 

L 


L 


where, in the usual notation, c 2 = T / p. 

Find the total kinetic energy of the string when it passes through its unplucked 
position, by calculating it in each mode (each n) and summing, using the result 


E 


0 


1 

(2/7 +1) 2 


Confirm that the total energy is equal to the work done in plucking the string 
initially. 

Prove that the potential for p < a associated with a vertical split cylinder of 
radius a, the two halves of which (cos< f> > 0 and co&4> < 0) are maintained at 
equal and opposite potentials +V, is given by 


4F .A (-1)" /p\2«+i 

^)=VE^TI (a) C0S(2 " +1) ^ 

n=0 


A conducting spherical shell of radius a is cut round its equator and the two 
halves connected to voltages of +V and —V. Show that an expression for the 
potential at the point (r,9,(f>) anywhere inside the two hemispheres is 


00 

u(r, 9,(j>) = V 

n=0 


( — 1 )”(2/j) !(4/j + 3) 

2 2, ' +1 /i!(/i + 1)! 



P 2n +i(cos0). 


(This is the spherical polar analogue of the previous question.) 

A slice of biological material of thickness L is placed into a solution of a 
radioactive isotope of constant concentration Co at time t = 0. For a later time f 
find the concentration of radioactive ions at a depth x inside one of its surfaces 
if the diffusion constant is k. 

Two identical copper bars are each of length a. Initially, one is at 0°C and the 
other at 100 °C; they are then joined together end to end and thermally isolated. 
Obtain in the form of a Fourier series an expression u(x, t ) for the temperature 
at any point a distance x from the join at a later time t. (Bear in mind the heat 
flow conditions at the free ends of the bars.) 

Taking a = 0.5 m estimate the time it takes for one of the free ends to 
attain a temperature of 55 °C. The thermal conductivity of copper is 3.8 x 
10 2 J m -1 K _1 s- 1 , and its specific heat capacity is 3.4 x 10 6 J m~ 3 K _1 . 

A sphere of radius a and thermal conductivity k\ is surrounded by an infinite 
medium of conductivity k 2 in which, far away, the temperature tends to T x . 
A distribution of heat sources q(9 ) embedded in the sphere’s surface establish 
steady temperature fields 7j(r, 9) inside the sphere and T 2 (r,9 ) outside it. It can 
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19.21 

19.22 


19.23 


19.24 


be shown, by considering the heat flow through a small volume that includes 
part of the sphere’s surface, that 

8Ti 8T 2 

ki— k 2 — — = q(8) on r = a. 

or or 


Given that 

1 00 

q(0 ) = - YVp „( cos 9), 

a U 


find complete expressions for Ti(r,9) and T 2 (r,9). What is the temperature at 
the centre of the sphere? 

Using result (19.77) from the worked example in the text, find the general 
expression for the temperature u(x,t) in the bar, given that the temperature 
distribution at time t = 0 is u(x, 0) = exp(— x 2 /a 2 ). 

(a) Show that the gravitational potential due to a uniform disc of radius a and 
mass M, centred at the origin, is given for r < a by 

1- -Pi(cos0)+ ^ f-) P 2 (cos0)- l (-) P 4 ( cos0)H , 

a 2 \aJ 8 \aJ 

and for r > a by 

l-i0 2 p 2 (cos 0 ) + i0 4 p 4( cos 0 )-...; 3 

where the polar axis is normal to the plane of the disc. 

(b) Reconcile the presence of a term P t ( cos 9), which is odd under 9 — > n — 9, 
with the symmetry with respect to the plane of the disc of the physical 
system. 

(c) Deduce that the gravitational field near an infinite sheet of matter of constant 
density p per unit area is 2nGp. 

In the region —co < x, y < oo and — t < z < t, a charge-density wave p(r) = 
A cos qx, in the x-direction, is represented by 

iqx rco 

P( r) = —j= / p(o:)<?' ar doc. 

-\J2tl J —co 

The resulting potential is represented by 


GM 

r 


2 GM 
a 


iqx rex, 

V(r) = --= / V(a)e iaz da. 
xf2n i-oo 

Determine the relationship between V(a) and p(a), and hence show that the 
potential at the point (x,0,0) is 


A f 00 sin let 

^Lw 3 + r) A ' 


Point charges q and —qa/b (with a < b) are placed respectively at a point P, a 
distance b from the origin 0, and a point Q between 0 and P, a distance a 2 /b 
from O. Show, by considering similar triangles QOS and SOP, where S is any 
point on the surface of the sphere centred at 0 and of radius a, that the net 
potential anywhere on the sphere due to the two charges is zero. 

Use this result (backed up by the uniqueness theorem) to find the force with 
which a point charge q placed a distance b from the centre of a spherical 
conductor of radius a (< b) is attracted to the sphere (i) if the sphere is earthed, 
and (ii) if the sphere is uncharged and insulated. 
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19.6 EXERCISES 


19.25 


Find the Green's function G(r,ro) in the half-space z > 0 for the solution of 
V 2 ® = 0 with ® specified in cylindrical polar coordinates (p, <f>, z) on the plane 
z = 0 by 


®(p> <M) 


1 for p < 1, 
1/p for p > 1. 


19.26 


19.27 

19.28 


Determine the variation of ®(0,0,z) along the z-axis. 

Electrostatic charge is distributed in a sphere of radius R centred on the origin. 
Determine the form of the resultant potential 0(r) at distances much greater than 
R, as follows. 


(a) express in the form of an integral over all space the solution of 



(b) show that, for r >• r\ 


■ r'| = r - — + O 



(c) use results (a) and (b) to show that 4>(r) has the form 


*(r) - — + + O 

Y Y 3 



Find expressions for M and d, and identify them physically. 

Find, in the form of an infinite series the Green’s function of the V 2 operator for 
the Dirichlet problem in the region — oo < x < oo, — oo < y < oo, —c<z<c. 
Find the Green's function for the three-dimensional Neumann problem 


V 2 (j> = 0 for z > 0 and 
Determine <t>(x,y,z) if 


dz 


f (x, y) on z = 0. 


f(x,y) 


<5(y) for |x| < a, 
0 for |x| > a. 


19.29 


(a) By applying the divergence theorem to the volume integral 

I [(^>(V 2 — m 2 )ip — ip(V 2 — m 2 )</i] dV 

obtain a Green’s function expression, as the sum of a volume integral and a 
surface integral, for (j>(r') that satisfies 

V 2 <j> — m 2 (j> = p 

in V and takes the specified form <f> = f on S, the boundary of V. The 
Green's function G(r,r') to be used satisfies 

V 2 G - w 2 G = <5(r - r') 

and vanishes when r is on S. 

(b) When V is all space, G(r, r') can be written as G(t) = g(t)/t where t = |r — r'| 
and g(f) is bounded as t — > oo. Find the form of G(t). 

(c) Find 4>(r) in the half space x > 0 if p(r) = <5(r — ri) and <j> = 0 both on x = 0 
and as r — > oo. 
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19.30 


Consider the PDE Cu( r) = p( r), for which the differential operator C is given by 

C = V-[p(r)V] + q(r), 

where p( r) and q( r) are functions of position. By proving the generalised form of 
Green’s theorem. 


/« 


(</>£ip — ipCcf)) dV = (p p(</>Vtp — xpV(j)) • n dS , 


show that the solution of the PDE is given by 


L 


u(r 0 ) = / G(r,r 0 )p(r)dV(r) + j) p(r) 


8G(r,to) 8u( r) 

u(r) — G(r,r 0 ) 


3 ^ V- ? - u/ n 

on on 

where G(r,r 0 ) is the Green's function satisfying £G(r,r 0 ) = <5(r — r 0 ). 


dS( r), 


19.1 

19.2 

19.3 

19.4 

19.5 

19.8 

19.9 


19.10 

19.11 

19.12 


19.13 

19.15 

19.17 

19.18 

19.19 


19.20 


19.7 Hints and answers 


(a) Cexp[2(x 2 + 2y)]; (b) C(x 2 y)\ 

There is heat flow only across z = +a. It is into the cube at a rate of kAp^ 2 / ^j2. 
u(x,y,t) = &\n(nnx/ ci)s\n(mny /b)(A sin cot + B cos cot). 

■ 2 ihT' 

etc., — 7 - = E ; 


(a) _ Fjl 

2m X 2m 


(b) As in (a), but with solutions X = Asin(p x x/h), etc. with p x a/h = n x n. 


(a) 6 u/r 2 , —6u/r 2 , 0, £ = 2, m = 0; 

(b) 2 u/r 2 , (cot 2 9 — 1 )u/r 2 ; —u/{r 2 sin 2 6), { = 1, m = 1. 

The first term can contain only f = 1,2 and m = +1, the second only (' = 0, 1,2 
and m = 0; /(0,0) = (ti) 1/2 [Y 0 0 - 3- 1/2 Y “ - (2/3) 1/2 Y 1 1 - (2/15) 1/2 Yc 1 ]. 

Solutions of the form / give ( as —1,1,2, 4. Because of the asymptotic form of 
i p, an r 4 term cannot be present. The coefficients of the three remaining terms are 
determined by the two boundary conditions u = 0 on the sphere and the form of 
i p for large r. 

If xp(p,4>) = R{p)(b(4>), show that ® (4) + 4<D" = 0 and hence that (D = A + B(f> + 
C cos 2 (f> + D sin 20. 

Express cos 2 cf> in terms of cos 20; T(p,tf> ) = A + B/2 + (Bp 2 /2a 2 )cos20. 

(a) u(p,0) = (4/7t)E„ odd n ~ l (p/a) n sin«0. 

(b) V 2 a = 0, and V 2 /? = 0 separately. On p = a, a + 0 + n/2 = n. 

(c) Equate the two forms (uniqueness theorem) and then set p = a. 

The Fourier series is 2J2„ odd 1,1 si nn0. 

(A cos mx + B sin mx + C cosh mx + D sinh mx ) cos(c ot + e), with m 4 a 4 = w 2 . 

E„ = 16pA 2 c 2 /[(2n + 1 ) 2 7r 2 L] ; £ =2 pc 2 A 2 /L = f^[2Tv/(^L)] dv. 

You will need the result from exercise 17.7. 

Write C(x, t) = Co + sm(nnx/L)f n (t) where f„(t) — > 0 as f — > oo; 

A n = —4C 0 /(nn) and /„(f) = exp[— (Kirn 2 / L 2 )t] for n odd, and A n = 0 for n even. 
Since there is no heat flow at x = +a , use a series of period 4a, u(x, 0) = 100 for 
0 < x < 2 a, u(x, 0) = 0 for —2 a < x < 0. 


_ 200 0 ^ 

u(x, t) = 50 4 2_ 


n z — ' 2n + 1 

n = 0 


(2n + l)7ix 


2 a 


exp 


k(2n + l) 2 n 2 t 
4 a 2 s 


Taking only the n = 0 term gives f as 2300 s. 

Ti(r,0) = J2T K(r/a) m P m (cos9) + q 0 /k 2 + 70, 

T 2 (r,0) = Y1T b m (a/r) m+l P m (cosd) + aq 0 /{k 2 r) + T w , 

where in both cases b m = q m /[mk i + (m + l)fc 2 ] : T(0,9) = qo/k 2 + T 0. 
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19.7 HINTS AND ANSWERS 


19.21 

19.22 


19.23 

19.24 


19.25 


19.26 

19.27 


19.28 


19.29 


u(x,t ) = [a/(a 2 + 4icf) 1/2 ] exp[— x 2 /(a 2 + 4>ct)]. 

(a) u(r = z,0) = 2 MGa~ 2 [(a 2 + z 2 ) 1/2 — z], (b) For 6 > nil, the factor in the 
square brackets is ( a 2 + z 2 ) 1 / 2 + z. (c) Find du/dr at 9 = 0 for r < a, and let 
a — > oo. 

Fourier-transform Poisson's equation to show that p(o<) = e 0 (oi 2 + q 2 )V( a). 

(i) q 2 ab/[4ne 0 {b 2 — a 2 ) 2 ] ; (ii) [q 2 ab /(47ie 0 )] [(b 2 — a 2 )~ 2 — b~ 4 ]. Obtain (ii) from (i) 
by adding a further image charge +qa/b at O, to give a net zero electrostatic 
flux from the sphere while maintaining its equipotential property. 

Follow the worked example that includes result (19.98). For part of the explicit 
integration, substitute p = z tan a. 

z( 1 T z 2 ) 1/2 — z 2 + (1 + z 2 ) 1/2 — 1 


0(0, 0,z) = 


z(l + Z 2 ) 1 / 2 


(a) See equation (19.94); (c) M = (47ie 0 ) 1 f p(r')dV' = total charge on the 
sphere, d = (4ne 0 )^ 1 / p(r')r' dV = dipole moment of the sphere. 


1 00 


i 


+ 


s] (x - x 0 ) 2 + (y - y 0 ) 2 + (z + (-l)’To - nc) 2 

1 

V (x - x 0 ) 2 + (y - y 0 ) 2 + (z + (-1 )"^o + nc) 2 


G(r,r 0 ) = 


1 1 

471 V( x - -^o ) 2 + (y - Po ) 2 + (^ - so ) 2 
1 

V (x - x 0 ) 2 + {y - y 0 ) 2 +Jz + z 0 ) 2 
<t>(x,y,z ) = f sintT 1 0 + A + sintT 1 ° * 

2k y vp + z 2 Vy + z 

(a) As given in equation (19.89), but with r 0 replaced by r'. 

(b) Move the origin to r' and integrate the defining Green's equation to obtain 


. 2 dG 2 
471 1 — m 

dr 


G(t’)4nt' 2 dt’ = 1, 


leading to G(t) = [—1/(4 nt)\e 

(c) 4>{r) = [—1/(471 )]{p- l e~ mp — q- l e~ mq ), where p = |r — ri| and q = |r — r 2 1 with 
H = (xi,yi,zi) and r 2 = (— xi,yi,zi). 
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20 


Complex variables 


Throughout this book references have been made to results derived from the 
theory of complex variables. This theory thus becomes an integral part of the 
mathematics appropriate to physical applications. The difficulty with it, from the 
point of view of a book such as the present one, is that although it has many 
practical applications its underlying basis has a distinctly pure mathematics 
flavour. 

Thus, to adopt a comprehensive rigorous approach would involve a large 
amount of groundwork in analysis, for example formulating precise definitions 
of continuity and differentiability, developing the theory of sets and making a 
detailed study of boundedness. Instead, we will be selective and pursue only those 
parts of the formal theory that are needed to establish the results used elsewhere 
in this book and some others of general utility. 

In this spirit, the proofs that have been adopted for some of the standard 
results of complex variable theory have been chosen with an eye to simplicity 
rather than sophistication. This means that in some cases the imposed conditions 
are more stringent than would be strictly necessary if more sophisticated proofs 
were used; where this happens the less restrictive results are usually stated as 
well. The reader who is interested in a fuller treatment should consult one of the 
many excellent textbooks on this fascinating subject.! 

One further concession to ‘hand-waving’ has been made in the interests of 
keeping the treatment to a moderate length. In several places phrases such as ‘can 
be made as small as we like’ are used, rather than a careful treatment in terms 
of ‘given e > 0, there exists a 8 > 0 such that’. In the authors’ experience, some 
students are more at ease with the former type of statement despite its lack of 


f For example, Knopp, Theory of Functions. Part I (Dover, 1945); Phillips, Functions of a Complex 
Variable (Oliver and Boyd. 1954) ; Titchmarsh, The Theory of Functions (Oxford, 1952). 
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20.1 FUNCTIONS OF A COMPLEX VARIABLE 


precision whilst others, those who would contemplate only the latter, are usually 
well able to supply it for themselves. 


20.1 Functions of a complex variable 

The quantity /(z) is said to be a function of the complex variable z if to every value 
of z in a certain domain R (a region of the Argand diagram) there corresponds 
one or more values of f(z). Stated like this /(z) could be any function consisting 
of a real and an imaginary part, each of which is, in general, itself a function of x 
and y. If we denote the real and imaginary parts of /(z) by u and v respectively, 
then 

f(z) = u(x,y) + iv(x,y). 


In this chapter, however, we will be primarily concerned with functions that are 
single-valued, so that for each value of z there corresponds just one value of f(z), 
and differentiable in a particular sense, which we now discuss. 

A function /(z) that is single-valued in some domain R is differentiable at the 
point z in R if the derivative 


/'(z) = lim 

Az— >0 


/(z + Az) — /(z) 
Az 


( 20 . 1 ) 


exists and is unique, in that its value does not depend upon the direction in the 
Argand diagram from which Az tends to zero. 


►S/jow that the function f(z) = x 2 — y 2 + i2xy is differentiable for all values of z. 


Considering the definition (20.1), and taking Az = Ax + iAy, we have 
f{z +Az)-/(z) 

Az 

(x + Ax) 2 — (y + Ay) 2 + 2 i(x + Ax)(y + A y) — x 2 + y 2 — 2 ixy 
Ax + iAy 

2xAx + (Ax) 2 — 2yA_y — (Ay) 2 + 2i(xAy + yAx + AxAy) 

Ax + iAy 

„ , .„ , (Ax) 2 — (Ay) 2 + 2iAxAy 

= 2x + :2y -I — . 

Ax + ;Ay 

Now, in whatever way Ax and Ay are allowed to tend to zero (e.g. taking Ay = 0 and 
letting Ax — > 0 or vice versa), the last term on the right will tend to zero and the unique 
limit 2x + /2y will be obtained. Since z was arbitrary, f(z) with u = x 2 — y 2 and v = 2 xy 
is differentiable at all points in the (finite) complex plane. ◄ 

We note that the above working can be considerably reduced by recognising 
that, since z = x + iy, we can write /(z) as 

/(z) = x 2 — y 2 + 2 ixy = (x + iy) 2 = z 2 . 
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We then find that 


ru-) 


lim 

Ar— >0 


(z + Az) 2 -z 2 
A z 


lim 

A;-*0 


(Az) 2 + 2zAz 
Az 


( lim Az ) 

VAz-^o ) 


+ 2z = 2z, 


from which we see immediately that the limit both exists and is independent of 
the way in which Az — > 0. Thus we have verified that f(z ) = z 2 is differentiable 
for all (finite) z. We also note that the derivative is analogous to that found for 
real variables. 

Although the definition of a differentiable function clearly includes a wide 
class of functions, the concept of differentiability is restrictive and, indeed, some 
functions are not differentiable at any point in the complex plane. 


► Show that the function f(z) = 2 y + ix is not differentiable anywhere in the complex plane. 


In this case f(z) cannot be written simply in terms of z, and so we must consider the 
limit (20.1) in terms of x and y explicitly. Following the same procedure as in the previous 
example we find 


/(z + Az)-/(z) 
Az 


2 y + 2Ay + ix + iAx — 2 y — ix 
Ax + /Ay 

2Ay + iAx 
Ax + i Ay 


In this case the limit will clearly depend on the direction from which Az — > 0. Suppose 
Az — > 0 along a line through z of slope m, so that Ay = mAx, then 


7(z + Az)-/(z)l 

= lim 

2Ay + iAx 

2m + i 

Az 

A.x + /Ay 


Ax, A y^0 

1 + im 


This limit is dependent on m and hence on the direction from which Az — > 0. Since this 
conclusion is independent of the value of z, and hence true for all z, f(z ) = 2y + ix is 
nowhere differentiable. ◄ 

A function that is single-valued and differentiable at all points of a domain R 
is said to be analytic (or regular ) in R. A function may be analytic in a domain 
except at a finite number of points (or an infinite number if the domain is 
infinite); in this case it is said to be analytic except at these points, which are 
called the singularities of /(z). (In our treatment we will not consider cases in 
which an infinite number of singularities occur in a finite domain.) 
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20.2 THE CAUCHY-RIEMANN RELATIONS 


► S/iow that the function f{z) = 1/(1 — z) is analytic everywhere except at z = 1. 


Since f(z) is given explicitly as a function of z, evaluation of the limit (20.1) is somewhat 
easier. We find 


/'(z) = lint 

Az->0 


= lim 

Az— >0 


f(z + Az) — /(z) 
A z 


= lim 

Az->0 


(1 


( 1 

1 ^1 


\ 1 — z — Az 

1-zJJ 


1 


l 

- z — Az)(l — z) 

= (T 

-Z ) 2 


independently of the way in which Az — > 0, provided z f= 1. Hence /(z) is analytic 
everywhere except at the singularity z = 1. ◄ 


20.2 The Cauchy-Riemann relations 


From examining the previous examples, it is apparent that for a function f(z ) 
to be differentiable and hence analytic there must be some particular connection 
between its real and imaginary parts u and v. We next establish what this 
connection must be, by considering a general function. 

If the limit 


L = lim 

As— >0 


/(z + Az)-f(z) 
Az 


( 20 . 2 ) 


is to exist and be unique, in the way required for differentiability, then any two 
specific ways of letting Az — >• 0 must produce the same limit. In particular, moving 
parallel to the real axis and moving parallel to the imaginary axis must do so. 
This is certainly a necessary condition, although it may not be sufficient. 

If we let /(z) = u(x,y) + iv{x,y) and Az = A.x + iAy then we have 


/(z + Az) = u(x + Ax, y + Ay ) + iv(x + Ax, y + Ay), 


and the limit (20.2) is given by 

L lim U ^ X ^ X ’ ^ y — M ( x ’ ^ ~ i’rfx, y) 

A.x, Ay— >o Ax + ;Ay 


If we first suppose that Az is purely real, so that Ay = 0, we obtain 


L = lim 

Ax— >0 


u(x + Ax, y) — u(x, y) v(x ■ 


A x,y)-v(x,y) 


Ax 


Ax 


8u 

8x 


. dv 

‘^'( 20 . 3 ) 


provided each limit exists at the point z. Similarly, if Az is taken as purely 
imaginary, so that Ax = 0, we find 


L = lim 

Ay— >0 


w(x, y 


■ Ay) - u(x,y) .v(x,y + Ay) - v(x,y) 


iAy 


iAy 


1 du 
i 8y 


dv 


(20.4) 
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For / to be differentiable at the point z, expressions (20.3) and (20.4) must 
be identical. It follows from equating real and imaginary parts that necessary 
conditions for this are 


du dv , dv du 

— = — and — = — — . 
8x 8y dx 8y 


(20.5) 


These two equations are known as the Cauchy-Riemann relations. 

We can now see why for the earlier examples (i) /(z) = x 2 — y 2 + i2xy might 
be differentiable and (ii) /(z) = 2 y + ix could not be. 


(i) u = x 2 — y 2 , v = 2 xy : 


du ^ d v 
8x X 8y 


(ii) u = 2 y, v = x: 


and 


dv du 

dhc = y = ~W 


du dv 

dx 8y 


but 


dv 

dx 


= 1^-2 = — 


du 

w 


It is apparent that for /(z) to be analytic something more than the existence 
of the partial derivatives of u and v with respect to x and y is required; this 
something is that they satisfy the Cauchy-Riemann relations. 

We may enquire also as to the sufficient conditions for /(z) to be analytic in 
R. It can be shownf that a sufficient condition is that the four partial derivatives 
exist, are continuous and satisfy the Cauchy-Riemann relations. It is the addi- 
tional requirement of continuity that makes the difference between the necessary 
conditions and the sufficient conditions. 


►h? which domain(s) of the complex plane is f(z) = |x| — i|y| an analytic function? 


Writing / = u + iv it is clear that both du/dy and dv/dx are zero in all four quadrants 
and hence that the second Cauchy-Riemann relation in (20.5) is satisfied everywhere. 

Turning to the first Cauchy-Riemann relation, in the first quadrant (x > 0, y > 0) we 
have f(z ) = x — iy so that 

du dv 

dx ’ dy 

which clearly violates the first relation in (20.5). Thus f(z) is not analytic in the first 
quadrant. 

Following a similiar argument for the other quadrants, we find 

On 

— = — 1 or +1 for x < 0 and x > 0 respectively, 

dx 

dv 

— = — 1 or +1 for y > 0 and y < 0 respectively. 

Therefore du/dx and dv/dy are equal, and hence f(z ) is analytic, only in the second and 
fourth quadrants. ◄ 


f See for example any of the references given earlier. 
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20.2 THE CAUCHY-RIEMANN RELATIONS 


Since x and y are related to z and its complex conjugate z* by 
x=^(z + z*) and y =^(z-z*), 


( 20 . 6 ) 


we may formally regard any function f = u + iv as a function of z and z\ rather 
than x and y. If we do this and examine df jdz* we obtain 


df_ 

dz* 


df dx df dy 
dx dz * dy dz * 



1 / du dv \ i 

2 \ dx dy) + 2 


/ du 

V^y 

dv 

X h 



du\ 

dy)' 



(20.7) 


Now, if / is analytic then the Cauchy-Riemann relations (20.5) must be satisfied, 
and these immediately give that df /dz" is identically zero. Thus we conclude that 
if / is analytic then / cannot be a function of z* and any expression representing 
an analytic function of z can contain x and y only in the combination x + iy, not 
in the combination x — iy. 

We conclude this section by discussing some properties of analytic functions 
that are of great practical importance in theoretical physics. These can be obtained 
simply from the requirement that the Cauchy-Riemann relations must be satisfied 
by the real and imaginary parts of an analytic function. 

The most important of these results can be obtained by differentiating the 
first Cauchy-Riemann relation with respect to one independent variable, and the 
second with respect to the other independent variable, to obtain the two chains 
of equalities: 


d / du \ d / 8v\ 8 ( dv\ d / du \ 

dx \dx) dx \ dy ) dy \dx) dy \dy) 

d / dv\ d ( du\ d ( du\ d f dv\ 

dx \5xy dx \dyy dy v<3xy dy 


Thus both u and v are separately solutions of Laplace’s equation in two dimen- 
sions, i.e. 


d 2 u d 2 u 
dx 2 dy 2 


and 


d 2 v d 2 v 
dx 2 dy 2 


( 20 . 8 ) 


We shall make use of this result in section 20.9. 

A further useful result concerns the two families of curves u(x,y ) = constant 
and v{x,y) = constant, where u and v are the real and imaginary parts of any 
analytic function / = u + iv. As discussed in chapter 10, the vector normal to the 
curve u(x,y ) = constant is given by 


„ du. du. 

Vm = y 1 + ttL 
ox dy 


(20.9) 
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where i and j are the unit vectors along the x- and y- axes respectively. A similar 
expression exists for Vu, the normal to the curve v(x,y) = constant. Taking the 
scalar product of these two normal vectors we obtain 

„ „ du dv du 8 v 

Vm • Vr = — — + — — 

8x 8x dy 8y 

du du du du 

dx dy dy dx 


where in the last line we have used the Cauchy-Riemann relations to rewrite the 
partial derivatives of v as partial derivatives of u. Since the scalar product of the 
normal vectors is zero, they must be orthogonal and the curves u(x,y) = constant 
and v(x,y) = constant must therefore intersect at right angles. 


► Use the Cauchy-Riemann relations to show that, for any analytic function f = u + iv, the 
relation |Vu| = |Vr| must hold. 


From (20.9) we have 


|Vu| 2 = Vu - Vu = ( 



Using the Cauchy-Riemann relations to write the partial derivatives of u in terms of those 
of v, we obtain 


|Vu| 2 = 




= |Ve| 2 , 


from which the result |Vu| = |Vi>| follows immediately. ◄ 


20.3 Power series in a complex variable 

The theory of power series in a real variable was considered in chapter 4, which 
also contained a brief discussion of the natural extension of this theory to a series 
such as 

00 

/(z) = 5>„z n , (20.10) 

n = 0 

where z is a complex variable and the a n are in general complex. We now consider 
complex power series in more detail. 

Expression (20.10) is a power series about the origin and may be used for 
general discussion since a power series about any other point zo can be obtained 
by a change of variable from z to z — zo. If z were written in its modulus and 
argument form z = rexpiO, expression (20.10) would become 

OO 

m = 5 > r n exp(inO). ( 20 . 11 ) 

n = 0 
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This series is absolutely convergent if 

00 

5 ] Mr n , ( 20 . 12 ) 

n = 0 

which is a series of positive real terms, is convergent. Thus tests for the absolute 
convergence of real series can be used in the present context, and of these the 
most appropriate form is based on the Cauchy root test. The radius of convergence 
R is defined by 

i = limKI 1 /"; (20.13) 

K n — ►co 

the series (20.10) is absolutely convergent if |z| < R and divergent if |z| > R. If 
[z | = R no particular conclusion may be drawn, and this case must be considered 
separately, as discussed in subsection 4.5.1. 

A circle of radius R centred on the origin is called the circle of convergence 
of the series The cases R = 0 and R = oo correspond respectively to 

convergence at the origin only and convergence everywhere. For R finite the 
convergence occurs in a restricted part of the z-plane (the Argand diagram). For 
a power series about a general point zo, the circle of convergence is of course 
centred on that point. 


►Find the parts of the z-plane for which the following series are convergent : 

CO n OO OO n 

(') EE (”) Y, nlz "' ('”) EE 

n=0 n=0 n=l 


(i) Since (n\) l,n behaves like n as n — > oo we find lim(l/n!) 1/n = 0. Hence R = oo and the 

series is convergent for all z. (ii) Correspondingly, lim(n!) 1/n = oo. Thus R = 0 and the 

series converges only at z = 0. (iii) As n — > oo, (n) 1/n has a lower limit of 1 and hence 

lim(l /n) l/n = 1/1 = 1. Thus the series is absolutely convergent if |z| < 1. ◄ 


Case (iii) in the above example provides a good illustration of the fact that on its 
circle of convergence a power series may or may not converge. For this particular 
series the circle of convergence is |z| = 1, so let us consider the convergence of 
the series at two different points on this circle. Taking z = 1, the series becomes 


1111 
E^ “ 1+ 2 + 3 + 4 

n= 1 


which is easily shown to diverge (by, for example, grouping terms, as discussed in 
subsection 4.3.2). Taking z = —l, however, the series is given by 


E 


(—if 

n 



1 

4 
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which is an alternating series whose terms decrease in magnitude and which 
therefore converges. 

The ratio test discussed in subsection 4.3.2 may also be employed to investi- 
gate the absolute convergence of a complex power series. A series is absolutely 
convergent if 


lim 

n— kx) 


K+ilM n+1 

\a n \\z\ n 


lim 

77— *00 


K+ilM 

l^?zl 


< 1 


(20.14) 


and hence the radius of convergence R of the series is given by 


1 

R 


lim 

n—>co | a n | 


For instance, in case (i) of the previous example, we have 


R n— >oo (n + 1)! n—too n + 1 

Thus the series is absolutely convergent for all (finite) z, confirming the previous 
result. 

Before turning to particular power series, we conclude this section by stating 
the important resultf that the power series Y^o a n z " has a sum that is an analytic 
function of z inside its circle of convergence. 

As a corollary to the above theorem, it may further be shown that if /(z) = 
Y^ a n z n then, inside the circle of convergence of the series, 


/'(z) = ^ na„z n 1 
77=0 


Repeated application of this result demonstrates that any power series can be 
differentiated any number of times inside its circle of convergence. 


20.4 Some elementary functions 

In the example at the end of the previous section it was shown that the function 
exp z defined by 

00 -n 

expz = ^^ (20.15) 

77=0 

is convergent for all z of finite modulus and is thus, by the discussion of 
the previous section, an analytic function over the whole z-plane.| Like its 


1 For a proof see, for example, Riley, Mathematical Methods for the Physical Sciences (CUP, 1974), 
p. 446. 

$ Functions that are analytic in the whole z-plane are usually called integral or entire functions. 
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real-variable counterpart it is called the exponential function ; also like its real 
counterpart it is equal to its own derivative. 

The multiplication of two exponential functions results in a further exponential 
function, in accordance with the corresponding result for real variables. 


►S/jow that expzi expz 2 = exp(zi + 22). 


From the series expansion (20.15) of expz! and a similar expansion for expz 2 , it is clear 
that the coefficient of z[z 2 in the corresponding series expansion of expzi expz? is simply 
l/(r!s!). 

But, from (20.15) we also have 


00 

exp(zi +z 2 ) = ]T 

n=0 


(z 1 + Z2)” 
n! 


In order to find the coefficient of z[z 2 in this expansion, we clearly have to consider the 
term in which n = r + s, namely 


(zi + z 2 ) r+s 
(r + s)! 


1 


(r + s)! 


( r+s C 0 zJ +s + ■ ■ • + r+s C s z[z s 2 + ■ ■ • + r+s C r+s z r + s ) . 


The coefficient of zfz* in this is given by 


r+ s C 1 = (r + 5)! 1 = 1 

S (r+s)\ sir! (r + s) ! r Is! ' 

Thus, since the corresponding coefficients on the two sides are equal and all the series 
involved are absolutely convergent for all z, we can conclude that expzi expz 2 = exp(z! + 
z 2 ). ◄ 


As an extension of (20.15) we may also define the complex exponent of a real 
number a > 0 by the equation 


a z = exp(z In a), (20.16) 

where Inn is the natural logarithm of a. The particular case a = e and the fact 
that lne = 1 enable us to write expz interchangeably with e z . If z is real then the 
definition agrees with the familiar one. 

The result for z = iy, 


exp iy = cosy + i siny, (20.17) 

has been met already in equation (3.23). Its immediate extension is 

expz = (expx)(cosy + isiny). (20.18) 

As z varies over the complex plane the modulus of expz takes all real positive 
values, except that of 0. However, two values of z that differ by 2nni, for any 
integer n, produce the same value of expz, as given by (20.18), and so expz is 
periodic with period 2ni. If we denote expz by t then the strip —n < y < n in 
the z-plane corresponds to the whole of the f-plane, except for the point t = 0. 
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The sine, cosine, sinh and cosh functions of a complex variable are defined from 
the exponential function exactly as are those for real variables. The functions 
derived from them (e.g. tan and tanh), the identities they satisfy, and their 
derivative properties, are also just as for real variables. In view of this we will 
not give them further attention here. 

The inverse function of expz is given by w, the solution of 

expw = z. (20.19) 

This inverse function was discussed in chapter 3, but we mention it again here 
for completeness. By virtue of the discussion following (20.18), w is not uniquely 
defined and is indeterminate to the extent of any integer multiple of 2ni. If we 
express z as 

z = r exp i6 , 

where r is the (real) modulus of z, and 6 is its argument (—tc < 8 < n), then 
multiplying z by exp(2m7i), where n is an integer, will result in the same complex 
number z. Thus we may write 


z = rexp[i(0 + 2rm)], 

where n is an integer. If we denote w in (20.19) by 

w = Lnz = lnr + HO + 2nn), (20.20) 

where In r is the natural logarithm (to base e) of the real positive quantity r, then 
Lnz is an infinitely multivalued function of z. Its principal value, denoted by lnz, 
is obtained by taking n = 0 so that its argument lies in the range — n to n. Thus 

lnz = lnr + i6, with —n < 6 < n. (20.21) 

Now that the logarithm of a complex variable has been defined, definition 
(20.16) of a general power can be extended to cases other than those in which a 
is real and positive. If t (^= 0) and z are both complex then the zth power of t is 
defined by 


t z = exp(z Ln t). (20.22) 

Since Lnf is multivalued, so is this definition. Its principal value is obtained by 
giving Ln t its principal value, In f. 

If t (^ 0) is complex but z is real and equal to 1 / n, then (20.22) provides a 
definition of the 77th root of t. Because of the multivaluedness of Ln t, there will 
be more than one 77th root of any given t. 
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► S/jow that there are exactly n distinct nth roots of t. 


From (20.22) the ath roots of t are given by 


r 1 /" = exp 



On the RHS let us write f as follows : 

t = r exp[i(0 + 2kn)], 
where k is an integer. We then obtain 


t 1/n = exp 


1 (0 + 2fc7t) 

- In r + i 

n n 


= r 1/n exp 


. (0 + 2kn) 


where k = 0, 1, . . . , n — 1 ; for other values of k we simply recover the roots already found. 
Thus t has n distinct nth roots. ◄ 


20.5 Multivalued functions and branch cuts 

In the definition of an analytic function, one of the conditions imposed was 
that the function is single-valued. However, as shown in the previous section, the 
logarithmic function, a complex power and a complex root are all multivalued. 
Nevertheless, it happens that the properties of analytic functions can still be 
applied to these and other multivalued functions of a complex variable provided 
that suitable care is taken. This care amounts to identifying the branch points of 
the multivalued function f(z) in question. If z is varied in such a way that its 
path in the Argand diagram forms a closed curve that encloses a branch point, 
then, in general, /(z) will not return to its original value. 

For definiteness let us consider the multivalued function /(z) = z 1 / 2 and express 
z as z = r exp id. From figure 20.1(a), it is clear that, as the point z traverses any 
closed contour C that does not enclose the origin, 9 will return to its original 
value after one complete circuit. However, for any closed contour C' that does 
enclose the origin, after one circuit 9 — > 9 + 27i (see figure 20.1(b)). Thus, for the 
function /(z) = z 1 / 2 , after one circuit 

r 1/2 exp(/0/2) -» r 1/2 exp[/(0 + 2 tt)/ 2] = -r 1/2 exp(;0/2). 

In other words, the value of /(z) changes around any closed loop enclosing the 
origin; in this case /(z) — > — /(z). Thus z = 0 is a branch point of the function 
/(z) = z 1 / 2 . 

We note in this case that if any closed contour enclosing the origin is traversed 
twice then /(z) = z 1 / 2 returns to its original value. The number of loops around 
a branch point required for any given function /(z) to return to its original value 
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(a) (b) (c) 

Figure 20.1 (a) A closed contour not enclosing the origin; ( b ) a closed 

contour enclosing the origin; (c) a possible branch cut for f(z) = z 1/2 . 


depends on the function in question, and for some functions (e.g. Lnz, which also 
has a branch point at the origin) the original value is never recovered. 

In order that /(z) may be treated as single-valued we may define a branch cut 
in the Argand diagram. A branch cut is a line (or curve) in the complex plane 
and may be regarded as an artificial barrier that we must not cross. Branch cuts 
are positioned in such a way that we are prevented from making a complete 
circuit around any one branch point, and so the function in question remains 
single-valued. 

For the function f(z) = z 1 / 2 , we may take as a branch cut any curve starting 
at the origin z = 0 and extending out to |z| = oo in any direction, since all 
such curves would equally well prevent us from making a closed loop around the 
branch point at the origin. It is usual, however, to take the cut along the real or 
imaginary axis. For example, in figure 20.1(c), we take the cut as the positive real 
axis. By agreeing not to cross this cut, we restrict 0 to lie in the range 0 < 6 < 2n, 
and so keep /(z) single-valued. 

These ideas are easily extended to functions with more than one branch point. 


►Find the branch points of /(z) = ■ N [z 2 + 1, and hence sketch suitable arrangements of 
branch cuts. 


We begin by writing /(z) as 


/(z) = s/z 2 + 1 = a/(z — i)(z + i). 

As shown above the function g(z) = z 1/2 has a branch point at z = 0. Thus we might 
expect f(z ) to have branch points at values of z that make the expression under the square 
root equal to zero, i.e. at z = i and z = — i. 

As shown in figure 20.2(a), we use the notation 

z — i = ri exp i0\ and z + i = r 2 exp id 2 . 
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< 

i 


1 


y 


) 

( 

> 

< 


X 


(b) 


(c) 


Figure 20.2 (a) Coordinates used in the analysis of the branch points of 

f(z) = (z 2 + 1) 1/2 ; (b) one possible arrangement of branch cuts; (c) another 
possible branch cut, which is finite. 


We can therefore write f(z) as 

/(z) = s /nr 2 exp(i9i/2)exp(i9 2 /2) = ^/nriex p [i(0i + d 2 )/ 2] . 


Let us now consider how f(z) changes as we make one complete circuit around various 
closed loops C in the Argand diagram. If C encloses 

(i) neither branch point, then 0! — > 6 i, d 2 — > d 2 and so f(z) — > f(z); 

(ii) z = i but not z = — i, then — > 0 1 + 2n, d 2 — > d 2 and so f(z) — > — /(z); 

(iii) z = — i but not z = i, then 9 ! — > 6i, 0 2 — > 0 2 + 2n and so /(z) — > — /(z); 

(iv) both branch points, then — > 9i + 2n, 8 2 — > 0 2 + 2n and so /(z) — > /(z). 

Thus, as expected, /(z) changes value around loops containing either z = i or z = — i 
(but not both). We must therefore choose branch cuts that prevent us from making a 
complete loop around either branch point; one suitable choice is shown in figure 20.2(h). 

For this /(z), however, we have noted that after traversing a loop containing both branch 
points the function returns to its original value. Thus we may choose an alternative, finite, 
branch cut that allows this possibility but still prevents us from making a complete loop 
around just one of the points. A suitable cut is shown in figure 20.2(c). ◄ 


20.6 Singularities and zeroes of complex functions 

A singular point of a complex function f(z) is any point in the Argand diagram 
at which f(z ) fails to be analytic. We have already met one sort of singularity, 
the branch point, and in this section we shall consider other types of singularity 
as well as discuss the zeroes of complex functions. 

If f(z) has a singular point at z = zo but is analytic at all points in some 
neighbourhood containing zo but no other singularities then z = z 0 is called an 
isolated singularity. (Clearly branch points are not isolated singularities.) 
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The most important type of isolated singularity is the pole. If f(z) has the form 

m - (20.23) 

where n is a positive integer, g(z) is analytic at all points in some neighbourhood 
containing z = z 0 and g(z 0 ) f 0, then f(z) has a pole of order n at z = z 0 . An 
alternative (though equivalent) definition is that 

lim [(z — zo)"/(z)] = a, (20.24) 

Z-3-20 

where a is a finite, non-zero complex number. (If the limit equals zero then z = z 0 
is a pole of order less than n, or f(z) is analytic there; if the limit is infinite then 
the pole is of order greater than n.) It may also be shown that if f(z) has a pole 
at z = zo, then |/(z)| — » oo as z — > zq from any direction in the Argand diagram, f 
If no finite value of n can be found such that (20.24) is satisfied then z = z 0 is 
called an essential singularity. 


► Find the singularities of the functions 

(0 /(z) = 1 


1 — z 1 + z 


(it) f(z) = tanhz. 


(i) If we write f(z) as 


m = 


i 


i 


2z 


1— z 1 + z (1 — z)(l + z)’ 

we see immediately from either (20.23) or (20.24) that f(z) has poles of order 1 (or simple 
poles ) at z = 1 and z = — 1. (ii) In this case we write 

sinhz expz — exp(— z) 

/ z) = tanhz = — — = — ^ ^ -. 

cosh z exp z + exp( — z ) 

Thus f(z) has a singularity when expz = — exp(— z) or, equivalently, when 

expz = exp[i(2/i + l)n] exp(— z), 

where n is any integer. Equating the arguments of the exponentials we find z = (n + \)ni, 
for integer n. 

Furthermore, using l’Hopital’s rule (see chapter 4) we have 


lim 

-*("+z) Jt 


[z — ( n + l)ni] sinhz 


coshz 


= lim 

z—*(n+ J, )ti 


[z — (n + T )7T/'] coshz + sinhz 


= 1 . 


sinhz 

Therefore, from (20.24), each singularity is a simple pole. ◄ 

Another type of singularity exists at points for which the value of f(z) takes 
an indeterminate form such as 0/0 but lim z _» zo /(z) exists and is independent 


f Although perhaps intuitively obvious this result really requires formal demonstration by analysis. 
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of the direction from which zq is approached. Such points are called removable 
singularities. 

>-Show that f(z) = (sinz)/z has a removable singularity at z = 0. 

It is clear that f(z ) takes the indeterminate form 0/0 at z = 0. However, by expanding 
sin z as a power series in z, we find 



Thus lim z _,o/(z) = 1 independently of the way in which z — > 0, and so /(z) has a removable 
singularity at z = 0. ◄ 

An expression common in mathematics, but which we have so far avoided 
using explicitly in this chapter, is ‘z tends to infinity’. For a real variable such 
as |z| or R , ‘tending to infinity’ has a reasonably well-defined meaning. For a 
complex variable needing a two-dimensional plane to represent it, the meaning is 
not intrinsically well defined. However, it is convenient to have a unique meaning 
and this is provided by the following definition: the behaviour of f(z) at infinity 
is given by that of /( !/£) at £, = 0, where £ = 1/z. 


► Find the behaviour at infinity of (i) f(z) = a + bz 2 , (ii) f(z) = z(l + z 2 ) and (Hi) 
/(z) = expz. 


(i) f(z) = a + bz~ 2 : on putting z = l/£, = a + b( 2 , which is analytic at £ = 0; thus / 

is analytic at z = oo. (ii) /(z) = z(l +z 2 ): /( l/£) = l/\ + l/£ 3 ; thus / has a pole of order 
3 at z = oo. (iii) /(z) = expz : /( l/£) = 5ZX(n!) -1 £“"; thus f has an essential singularity 
at z = oo. ◄ 

We conclude this section by briefly mentioning the zeroes of a complex function. 
As the name suggests, if /(zo) = 0 then z = zo is called a zero of the function 
/(z). Zeroes are classified in a similar way to poles, in that if 

/(z) = (z-z 0 )"g(z), 

where n is a positive integer and g(zo) f 0, then z = zo is called a zero of order 
n of /(z). If n = 1 then z = zo is called a simple zero. It may further be shown 
that if z = z 0 is a zero of order n of /(z) then it is also a pole of order n of the 
function l//(z). 

We will return in section 20.13 to the classification of zeroes and poles in terms 
of their series expansions. 


20.7 Complex potentials 

Towards the end of section 20.2 it was shown that the real and the imaginary 
parts of an analytic function of z are separately solutions of Laplace’s equation 
in two dimensions. Analytic functions thus offer a possible way of solving some 
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Figure 20.3 The equipotentials (broken) and field lines (solid) for a line 
charge perpendicular to the z-plane. 


two-dimensional physical problems describable by a potential satisfying V 2 </> = 0. 
The general method is known as that of complex potentials. 

We found also that if / = u + iv is an analytic function of z then any curve 
u = constant intersects any curve v = constant at right angles. In the context of 
solutions of Laplace’s equation, this result implies that the real and imaginary 
parts of /(z) have an additional connection between them, for if the set of 
contours on which one of them is a constant represents the equipotentials of a 
system then the contours on which the other is constant, being orthogonal to 
each of the first set, must represent the corresponding field lines or stream lines, 
depending on the context. The analytic function / is the complex potential. It 
is conventional to use cj) and t p (rather than u and v ) to denote the real and 
imaginary parts of a complex potential, so that / = + up. 

As an example consider the function 

/(z)=^-lnz, (20.25) 

271€q 

in connection with the physical situation of a line charge of strength q per unit 
length passing through the origin, perpendicular to the z-plane (figure 20.3). Its 
real and imaginary parts are 

4> = In |z|, ip = —^argz. (20.26) 

27160 271€q 

The contours in the z-plane of (j> = constant are concentric circles and of ip = 
constant are radial lines. As expected these are orthogonal sets, but in addition 
they are respectively the equipotentials and electric field lines appropriate to the 
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field produced by the line charge (the minus sign is needed in (20.25) because the 
value of (j> must decrease with increasing distance from the origin). 

Suppose we make the choice that the real part </> of the analytic function / 
gives the conventional potential function; xp could equally well be selected. Then 
we may consider how the direction and magnitude of the field are related to /. 


>-Show that for any complex (electrostatic) potential f(z) the strength of the electric field 
is given by E = \f'(z)\ and that its direction makes an angle of n — arg[/'(z)] with the 
x-axis. 


Because <fi = constant is an equipotential, the field has components 

E x = ~ and = (20.27) 

ox Oy 

Since / is analytic, (i) we may use the Cauchy-Riemann relations (20.5) to change the 
second of these, obtaining 


df 

E x = -— and 

OX 


E 

y ~ dx ’ 


(ii) the direction of differentiation at a point is immaterial and so 


dj_ 

dz 


df_ = df_ .dxp = 

dx dx dx 


-E x + iE v 


From these it can be seen that the field at a point is given in magnitude by E 
and that it makes an angle with the x-axis given by n — arg[/'(z)]. ◄ 


(20.28) 


(20.29) 
= \f(z)\ 


It will be apparent from the above that much of physical interest can be 
calculated by working directly in terms of / and z. In particular, the electric field 
vector E may be represented, using (20.29) above, by the quantity 

£ = E x + iE y = -[f(z)Y. 


Complex potentials can be used in two-dimensional fluid mechanics problems 
in a similar way. If the flow is stationary (i.e. the velocity of the fluid does not 
depend on time) and irrotational, and the fluid is both incompressible and non- 
viscous, then the velocity of the fluid can be described by V = V0, where fi is the 
velocity potential and satisfies V 2 (j) = 0. If, for a complex potential f = <j) + iy>, 
the real part is taken to represent the velocity potential then the curves ip = 
constant will be the streamlines of the flow. In a direct parallel with the electric 
field, the velocity may be represented in terms of the complex potential by 

V=V X + iV y = [f(z)] m , 

the difference of a minus sign reflecting the same difference between the definitions 
of E and V. The speed of the flow is equal to |/'(z)|. Points where f'(z) = 0, and 
so the velocity is zero, are called stagnation points of the flow. 

Analogously to the electrostatic case, a line source of fluid at z = z 0 , perpendic- 
ular to the z-plane (i.e. a point from which fluid is emerging at a constant rate) 
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is described by the complex potential 

/(z) = k ln(z - z 0 ), 

where k is the strength of the source. A sink is similarly represented, but with k 
replaced by —k. Other simple examples are as follows. 

(i) The flow of a fluid at a constant speed Vq and at an angle a to the x-axis 
is described by f(z) = Vo{expict)z. 

(ii) Vortex flow, in which fluid flows azimuthally in an anticlockwise direction 
around some point zo, the speed of the flow being inversely proportional 
to the distance from z 0 , is described by /(z) = —ik ln(z — z 0 ), where k is 
the strength of the vortex. For a clockwise vortex k is replaced by —k. 


► Verify that the complex potential 

f(z) = V 0 ^ , 

is appropriate to a circular cylinder of radius a placed so that it is perpendicular to a 
uniform fluid flow of speed Vo parallel to the x-axis. 


Firstly, since /(z) is analytic except at z = 0, both its real and imaginary parts satisfy 
Laplace's equation in the region exterior to the cylinder. Also /(z) — > V 0 z as z — > oo, so 
that Re f(z) — > Vox, which is appropriate to a uniform flow of speed Vo in the x-direction 
far from the cylinder. 

Writing z = r exp id and using de Moivre’s theorem we have 


/(*) = Fo 


r exp id H exp(— id) 

r 


= Vo ( r -t ) cos d + iVo ( r ] sin 6 . 


Thus we see that the streamlines of the flow described by f(z) are given by 


w = Vq ( r sin d = constant. 

i r ' 


In particular, \p = 0 on r = a, independently of the value of d, and so r = a must be a 
streamline. Since there can be no flow of fluid across streamlines, r = a must correspond 
to a boundary along which the fluid flows tangentially. Thus /(z) is a solution of Laplace’s 
equation that satisfies all the physical boundary conditions of the problem, and so it is the 
appropriate complex potential. ◄ 


By a similar argument the complex potential /(z) = — E(z — a 2 /z) (note the 
minus signs) is appropriate to a conducting circular cylinder of radius a placed 
perpendicular to a uniform electric field E in the x-direction. 

The real and imaginary parts of a complex potential / = </> + dp have another 
interesting relationship in the context of Laplace’s equation in electrostatics or 
fluid mechanics. Let us choose </> as the conventional potential, so that xp represents 
the stream function (or electric field, depending on the application), and consider 
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Figure 20.4 A curve joining the points P and Q. Also shown is n, the unit 
vector normal to the curve. 


the difference in the values of ip at any two points P and Q connected by some 
path C, as shown in figure 20.4. This difference is given by 


V(Q)~ W(P) 




which, on using the Cauchy-Riemann relations, becomes 


V(Q)~ W(P) 




dx + 


8cj)_ 

dx 



V<f> • n ds 



ds. 


where n is the vector unit normal to the path C and s is the arc length along the 
path; the last equality is written in terms of the normal derivative dcfr/dn = V(/> • n. 

Now suppose that in an electrostatics application, the path C is the surface of 
a conductor; then 

dc/) g 

8n e 0 ’ 


where a is the surface charge density per unit length normal to the xy-plane. 
Therefore — eo[f((2) ~ w(P)] is equal to the charge per unit length normal to the 
xy-plane on the surface of the conductor between the points P and Q. Similarly, 
in fluid mechanics applications, if the density of the fluid is p and its velocity V 
then 

rQ rQ 

p[xp(Q) — ip{P)] = p J V0 • n ds = p J V ■ n ds 


is equal to the mass flux between P and Q per unit length perpendicular to the 
xy-plane. 


729 



COMPLEX VARIABLES 


►A conducting circular cylinder of radius a is placed with its centre line passing through 
the origin and perpendicular to a uniform electric field E in the x-direction. Find the charge 
per unit length induced on the half of the cylinder that lies in the region x < 0. 


As mentioned after the previous example, the appropriate complex potential for this 
problem is f(z ) = — E(z — a (i) 2 /z). Writing z = r exp id this becomes 


m = -e 


r exp id exp(— id) 

r 


= —E ( r ) cos d — iE ( r -\ ) sin d. 


so that on r = a the imaginary part of / is given by 


ip = —2Easmd. 


Therefore the induced charge q per unit length on the left half of the cylinder, between 
9 = 7i/2 and 9 = 3n/2, is given by 


q = 2eoEa[sin(2n/2) — sin(7i/2)] = — 4eo Ea. ◄ 


20.8 Conformal transformations 

We now turn our attention to the subject of transformations, by which we mean 
a change of coordinates from the complex variable z = x + iy to another, say 
w = r + is, by means of a prescribed formula : 

w = g(z) = r(x,y) + is(x, y). 

Under such a transformation, or mapping, the Argand diagram for the z-variable 
is transformed into one for the w-variable, although the complete z-plane might 
be mapped onto only a part of the tv-plane, or onto the whole of the tv-plane, or 
onto some or all of the w-plane covered more than once. 

We shall consider only those mappings for which tv and z are related by a 
function tv = g(z) and its inverse z = h(w) that are analytic, except possibly 
at a few isolated points; such mappings are called conformal. Their important 
properties are that, except at points at which g'(z), and hence h'(z), is zero or 
infinite : 

(i) continuous lines in the z-plane transform into continuous lines in the 
tv-plane; 

(ii) the angle between two intersecting curves in the z-plane equals the angle 
between the corresponding curves in the w-plane; 

(iii) the magnification, as between the z- and w-plane, of a small line element in 
the neighbourhood of any particular point is independent of the direction 
of the element; 

(iv) any analytic function of z transforms to an analytic function of w and 
vice versa. 
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Figure 20.5 Two curves C i and C 2 in the z-plane, which are mapped onto 
CJ and C' 2 in the w-plane. 


Result (i) is immediate, and results (ii) and (iii) can be justified by the following 
argument. Let two curves Ci and C 2 pass through the point zo in the z-plane and 
zi and Z 2 be two points on their respective tangents at z<>, each a distance p from 
zo. The same prescription with w replacing z describes the transformed situation; 
however, the transformed tangents may not be straight lines and the distances 
of w 1 and W 2 from wo have not yet been shown to be equal. This situation is 
illustrated in figure 20.5. 

In the z-plane zi and Z 2 are given by 

zi — zo = p exp iO 1 and zi — z$ = p exp 702. 

The corresponding descriptions in the w-plane are 

wi — wo = pi exp itfi 1 and W 2 — wo = P 2 exp i(f> 2 - 

The angles 0,- and </>,• are clear from figure 20.5. Now since w = g(z), where g is 
analytic, we have 


lim 


Wi — Wo 


Z1->Z0 \ Zi — Zo 

which may be written as 


= lim 


w 2 — Wo 


Z2->zo V Z 2 — Zq 


dg 

dz 


lim i — exp[i((hi — 0i)] 

p^o{ p 


= lim ( — exp[7(0 2 - 0 2 )] }> = g'(zo)- 
0 [ p 


(20.30) 


Comparing magnitudes and phases (i.e. arguments) in the equalities (20.30) 
gives the stated results (ii) and (iii) and adds quantitative information to them, 
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namely that for small line elements 

- * - * lg'(2o)l, (20.31) 

P P 


0i - 0i « 0 2 - 0 2 « argg'(z 0 ). (20.32) 


For strict comparison with result (ii), (20.32) must be written as 0i — 02 = 0i — 02, 
with an ordinary equality sign, since the angles are only defined in the limit 
p — * 0 when (20.32) becomes a true identity. We also see from (20.31) that 
the linear magnification factor is |g'(zo)|; similarly, small areas are magnified by 
lg'(zo)l 2 - 

Since in the neighbourhoods of corresponding points in a transformation angles 
are preserved and magnifications are independent of direction, it follows that small 
plane figures are transformed into figures of the same shape, but, in general, ones 
that are magnified and rotated (though not distorted). However, we also note 
that at any point where g'(z) = 0, the angle arg g'(z) through which line elements 
are rotated is undefined; these are called critical points of the transformation. 

The final result (iv) is perhaps the most important property of conformal 
transformations. If /(z) is an analytic function of z and z = h(w) is also analytic, 
then F(w) = f(h(w)) is analytic in w. Its importance lies in the further conclusions 
it allows us to draw from the fact that, since / is analytic, the real and imaginary 
parts of / = 0 + ixp are necessarily solutions of 


0 2 0 0 2 0 

8x 2 8y 2 


and 


6 2 \ p 8 2 i p 

dx 2 dy 2 


(20.33) 


Since the transformation property ensures that F = + PF is also analytic, we 

can conclude that its real and imaginary parts must themselves satisfy Laplace’s 
equation in the w-plane, 


0 2 <t 0 2 <f> 

dr 2 ds 2 


and 


d 2x ¥ 0 2l P 

dr 2 8s 2 


(20.34) 


Further, suppose that (say) Re f(z) = 0 is constant over a boundary C in the 
z-plane; then Re F(w) = O is constant over C in the z-plane. But this is the same 
as saying that Re F(w) is constant over the boundary C' in the w-plane, C' being 
the curve into which C is transformed by the conformal transformation w = g(z). 
This is discussed further in the next section. 

Examples of useful conformal transformations are numerous. For instance, 
w = z + b, w = (exp /0)z and w = az correspond respectively to a translation by 
b, a rotation through an angle 0 and a stretching (or contraction) in the radial 
direction (for a real). These three examples can be combined into the general 
linear transformation w = az + b, where in general a and b are complex. Another 
example is the inversion mapping w = 1/z, which maps the interior of the unit 
circle to the exterior and vice versa. Other, more complicated, examples also exist. 
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Figure 20.6 Transforming the upper half of the z-plane into the interior of 
the unit circle in the w-plane, in such a way that z = i is mapped onto w = 0 
and the points x = +oo are mapped onto w = 1. 


► Show that, if the point zo lies in the upper half of the z-plane then the transformation 

, 2-z 0 

w = (exp up) - 

z — z 0 

maps the upper half of the z-plane into the interior of the unit circle in the w-plane. Hence 
find a similar transformation that maps the point z = i onto w = 0 and the points x = +oo 
onto w = 1. 


Taking the modulus of w, we have 


M 


(exp if) 


z - Zo 


Z Zo 

N 

N 

o * 


Z-Zo 


Flowever, since the complex conjugate Zg is the reflection of z 0 in the real axis, if z and zo 
both lie in the upper half of the z-plane then |z — zo| < |z — Zq | ; thus |w| < 1 as required. 
We also note that (i) the equality holds only when z lies on the real axis, and so this axis 
is mapped onto the boundary of the unit circle in the w-plane; (ii) the point z 0 is mapped 
onto w = 0, the origin of the w-plane. 

By fixing the images of two points in the z-plane, the constants z 0 and f can also be 
fixed. Since we require the point z = i to be mapped onto w = 0, we have immediately 
zo = i. By further requiring z = +oo to be mapped onto w = 1, we find 1 = w = exp if 
and so f = 0. The required transformation is therefore 


and is illustrated in figure 20.6. ◄ 


z — i 
z + i’ 


We conclude this section by mentioning the rather curious Schwarz Christoff el 
transformation.! Suppose, as shown in figure 20.7, that we are interested in a 
(finite) number of points xi,X2,...,x n on the real axis in the r-plane. Then by 
means of the transformation 

w = |a J (<* - - X2 )(fc/*)-i • • • (<* - xj^- 1 dt}+B, (20.35) 


f Strictly speaking Ihe use of this transformation requires an understanding of complex integrals, 
which are discussed in section 20.10 below. 
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Figure 20.7 Transforming the upper half of the z-plane into the interior 
of a polygon in the w-plane, in such a way that the points xi,X2,...,x„ are 
mapped onto the vertices wi,W2,...,w„ of the polygon with interior angles 


we may map the upper half of the z-plane onto the interior of a closed polygon in 
the w-plane having n vertices wi, W 2 , . . . , w„ (which are the images of xi, X 2 , . . . , x n ) 
with corresponding interior angles ^>i, fa, . . . , <i> n , as shown in figure 20.7. The 
real axis in the z-plane is transformed into the boundary of the polygon itself. 
The constants A and B are complex in general and determine the position, 
size and orientation of the polygon. It is clear from (20.35) that dw/dz = 0 at 
x = xi, X 2 , . . . , x n , and so the transformation is not conformal at these points. 

There are various subtleties associated with the use of the Schwarz-Christoffel 
transformation. For example, if one of the points on the real axis in the z-plane 
(usually x„) is taken at infinity then the corresponding factor in (20.35) (i.e. the 
one involving x„) is not present. In this case, the point(s) x = +oo are considered 
as one point, since they transform to a single vertex of the polygon in the w-plane. 

We can also map the upper half of the z-plane onto an infinite open polygon 
by considering it as the limiting case of some closed polygon. 

►Find a transformation that maps the upper half of the z-plane onto the triangular region 
shown in figure 20.8 in such a way that the points Xi = —1 and xt = 1 are mapped 
onto the points w = —a and w = a respectively, and the point X 3 = +00 is mapped onto 
w = ib. Hence find a transformation that maps the upper half of the z-plane into the region 
—a < r < a, s > 0 of the w-plane, as shown in figure 20.9. 


Let us denote the angles at w 1 and W 2 in the w-plane by fi = <j >2 = <t> > where r/> = tan 1 (b/a). 
Since x 3 is taken at infinity we may omit the corresponding factor in (20.35) to obtain 

w = j A^(C + _ !)«■/*)-! d^+B 

= { A f o (f 2 - 1) (0/ ” M d£ j +B. (20.36) 

The required transformation may then be found by fixing the constants A and B as 
follows. Since the point z = 0 lies on the line segment X 1 X 2 it will be mapped onto the line 
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Figure 20.8 Transforming the upper half of the z-plane into the interior of 
a triangle in the w-plane. 


y / 

W3 

S ' 


W3 


II 




Xi 

x 2 W! 

<t> 1 

n 

4> 2 

n 

W2 

-1 

1 * - 

-a 

1 

1 r 


Figure 20.9 Transforming the upper half of the z-plane into the interior of 
the region — a < r < a, s > 0 in the w-plane. 


segment wiw? in the w-plane, and by symmetry must be mapped onto the point w = 0. 
Thus setting z = 0 and w = 0 in (20.36) we obtain B = 0. An expression for A can be 
found in the form of an integral by setting (for example) z = 1 and w = a in (20.36). 

We may consider the region in the w-plane in figure 20.9 to be the limiting case of the 
triangular region in figure 20.8 with the vertex W3 at infinity. Thus we may use the above, 
but with the angles at wi and W2 set to <f> = n/2. From (20.36), we obtain 

, r dt . _ t 

Jo 

By setting z = 1 and w = a, we find iA = lain , so the required transformation is 

la . 

w = — sin z. ◄ 
n 


20.9 Applications of conformal transformations 

In the previous section it was shown that, under a conformal transformation 
w = g(z) from z = x + iy to a new variable w = r + is, if a solution of Laplace’s 
equation in some region R of the x_y-plane can be found as the real or imaginary 
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part of an analytic function! °f z then the same expression put in terms of r and 
s will be a solution of Laplace’s equation in the corresponding region R' of the 
w-plane, and vice versa. In addition, if the solution is constant over the boundary 
C of the region R in the xv- plane then the solution in the w-plane will take the 
same constant value over the corresponding curve C that bounds R'. 

Thus, from any two-dimensional solution of Laplace’s equation for a particular 
geometry, further solutions for other geometries can be obtained by making 
conformal transformations. From the physical point of view the given geometry 
is usually complicated and so the solution is sought by transforming to a simpler 
one. Flowever, working from simpler to more complicated situations can provide 
useful experience, and make it more likely that the reverse procedure can be 
tackled successfully. 


►find the complex electrostatic potential associated with an infinite charged conducting 
plate y = 0, and thus obtain those associated with 

(i) a semi-infinite charged conducting plate (r > 0, s = 0), 

( ii ) the inside of a right-angled charged conducting wedge ( r > 0, s = 0 and 
r = 0. s > 0). 


Figure 20.10 (a) shows the equipotentials (broken lines) and field lines (solid) for the 
infinite charged conducting plane y = 0. Suppose that we elect to make the real part of 
the complex potential coincide with the conventional electrostatic potential. If the plate is 
charged to a potential V then clearly 


<f>(x, y) = V — ky, (20.37) 

where k is related to the charge density cr by k = c/eo, since physically the electric field E 
has components (0, o/e 0 ) and E = — ' Vf. 

Thus what is needed is an analytic function of z of which the real part is V — ky. This 
can be obtained by inspection, but we may proceed formally and use the Cauchy-Riemann 
relations to obtain the imaginary part xp(x,y) thus: 


^1 = ^=0 
dy dx 


and 


8 ip _ S4>_ 

(be = ~~dy = 


Hence ip = kx + c and, absorbing c into V, the required complex potential is 


f(z) = V — ky + ikx = V + ikz. 


(20.38) 


(i) Now consider the transformation 

w = g(z) = z 2 . (20.39) 

This satisfies the criteria for a conformal mapping (except at z = 0) and carries the upper 
half of the z-plane into the entire w-plane; the equipotential plane y = 0 goes into the 
half-plane r > 0, s = 0. 

By the general results proved, f(z) when expressed in terms of r and s will give a 
complex potential of which the real part will be constant on the half-plane in question; 


f In fact, the original solution in the xy-plane need not be given explicitly as the real or imaginary 
part of an analytic function. Any solution of V 2 0 = 0 in the xy-plane is carried over into another 
solution of V 2 <1> = 0 in the new variables by a conformal transformation, and vice versa. 
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(a) j-plane 


(b) w-plane 



(c) w-plane 


Figure 20.10 (a) The equipotential lines (broken) and field lines (solid) for 

an infinite charged conducting plane at y = 0, where z = x + iy; ( b ), (c) after 
the transformations w = z 2 , w = z l/1 of the situation shown in (a). 


we deduce that 


F(w) = f(z ) = V + ikz = V + ikw l/2 


(20.40) 


is the required potential. Expressed in terms of r, s and p = (r 2 + s 2 ) l/2 , w 1/2 is given by 


w ‘/ 2 = p 1 ' 1 



1/2 

+ i 



(20.41) 


and, in particular, the electrostatic potential is given by 

<D(r,s) = ReF(w) = V [(r 2 + s 2 ) 1/2 - r ] 1/2 . (20.42) 

V 2 

The corresponding equipotentials and field lines are shown in figure 20.10(fo). Using results 
(20.27)— (20.29), the magnitude of the electric field is 

|E| = \F'(w)\ = \likw~ l/2 \ = \k{r 2 + s 2 )~ l/4 . 

(ii) A transformation ‘converse' to that used in (i), 

w = g(z) = z 1/2 , 


has the effect of mapping the upper half of the z-plane into the first quadrant of the 
w-plane and the conducting plane y = 0 into the wedge r > 0, s = 0 and r = 0, s > 0. 

The complex potential now becomes 

F(w ) = V + ikw 2 

= V + ik[(r 2 - s 2 ) + 2 irs], (20.43) 


showing that the electrostatic potential is V — 2krs and the electric field has components 

E = (2ks,2kr). (20.44) 


Figure 20.10(c) indicates the approximate equipotentials and field lines. (Note that, in 
both transformations, g'(z) is either 0 or oo at the origin and so neither transformation 
is conformal there. Consequently there is no violation of result (ii), given at the start of 
section 20.8, concerning the angles between intersecting lines.) ◄ 
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Figure 20.11 (a) An infinite conducting wedge with interior angle n/a and a 

line charge at z = zo; (b) after the transformation w = z“, with an additional 
image charge placed at w = Wq. 


The method of images discussed in section 19.5 can also be used in conjunction 
with conformal transformations to solve Laplace’s equation in two dimensions. 


►A wedge of angle n/a with its vertex at z = 0 is formed by two semi-infinite conducting 
plates, as shown in figure 20.11(a). A line charge of strength q per unit length is positioned 
at z = z o, perpendicular to the z-plane. By considering the transformation w = z*,find the 
complex electrostatic potential for this situation. 


Let us consider the action of the transformation w = 2 “ on the lines defining the positions 
of the conducting plates. The plate that lies along the positive x-axis is mapped onto the 
positive r-axis in the w-plane, whereas the plate that lies along the direction exp(/7i/o<) is 
mapped into the negative r-axis, as shown in figure 20.1 l(fr). Similarly the line charge at 
z 0 is mapped onto the point w 0 = zg. 

From figure 20.1 1(b), we see that in the w-plane the problem can be solved by introducing 
a second line charge of opposite sign at the point wj, so that the potential fl> = 0 along 
the r-axis. The complex potential for such an arrangement is simply 

F(w) = ln(w - w 0 ) + ln(w - Wg). 

2neo 2neo 


Substituting w = z“ into the above shows that the required complex potential in the 
original z-plane is 


m 



. ◄ 


20.10 Complex integrals 

Corresponding to integration with respect to a real variable, it is possible to 
define integration with respect to a complex variable between two complex limits. 
Since the z-plane is two-dimensional there is clearly greater freedom and hence 
ambiguity in what is meant by a complex integral. If a complex function f(z) is 
single-valued and continuous in some region R in the complex plane, then we can 
define the complex integral of /(z) between two points A and B along some curve 
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Figure 20.12 Alternative paths for the integral of a function f(z) between A 
and B. 


in R ; its value will depend, in general, upon the path taken between A and B 
(see figure 20.12). However, we will find that for some paths that are different 
but bear a particular relationship to each other, the value of the integral does not 
depend upon which of the paths is adopted. 

Let a particular path C be described by a continuous (real) parameter t 
(a < t < fl) that gives successive positions on C by means of the equations 

x = x(t), y = y(t), (20.45) 

with t = a and t = fl corresponding to the points A and B respectively. Then the 
integral along path C of a continuous function f(z) is written 


/( z) dz 


(20.46) 


and can be given explicitly as a sum of real integrals as follows: 


/ f(z) dz = (u + iv)(dx + idy ) 

Jc Jc 


I c 

/ u dx —lv dy + i udy + i v dx 

lc Jc Jc Jc 

f 11 dx fP dv , dy . dx 

/ u— dt— v— dt + i u— dt + i v— dt. 
L dt L dt „ dt L dt 


(20.47) 


The question of when such an integral exists will not be pursued, except to state 
that a sufficient condition is that dx /dt and dy /dt are continuous. 
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(a) (b) (c) 

Figure 20.13 Different paths for an integral of f(z) = z _1 . See the text for 
details. 


► Evaluate the complex integral of f(z) = z 1 along the circle \z\ = R. starting and finishing 
at z = R. 


The path Ci is parameterised as follows (figure 20.13(a)): 

z(t) = R cos t + iR sin f, 0 < t < In, 

whilst f(z) is given by 

m- 4--4TA 

x + ly x 1 + y l 

Thus the real and imaginary parts of f(z) are 
x R cos t 


x 2 + y 2 R 2 

Hence, using expression (20.47), 


and v = 


- y 

x 2 + y 2 


Rsinf 

~R 


Jc i 7 Jo 


cos t 


-dz= (— Rsint)dt — 


R 


+ i 


f 


cos t 
~R 


R cos t dt + i 


-stnf 

R 

— sinf 
R 


R cos t dt 
(— Rsinf) dt 


= 0 + 0 + in + in = 2 ni. ◄ 


(20.48) 


With a bit of experience, the reader may be able to evaluate integrals like 
the LHS of (20.48) directly without having to write them as four separate real 
integrals. In the present case, 




— R sin t + iR cos t 
R cos f + iR sin t 


/>2n 


dt = 


i dt = 2ni. 


(20.49) 


This very important result will be used many times later, and the following should 
be carefully noted: (i) its value, (ii) that this value is independent of R. 

In the above example the contour was closed, and so it began and ended at 
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the same point in the Argand diagram. We can evaluate complex integrals along 
open paths in a similar way. 


► Evaluate the complex integral of f(z) = z 1 along 

(i) the contour C2 consisting of the semicircle \z\ = R in the half-plane y > 0. 

(see figure 20.13(b)), 

(ii) the contour C3 made up of the two straight lines C 3„ and Cjb (see 
figure 20.1 3( c) ). 


(i) This is just as in the previous example, except that now 0 < t < n. With this change we 
have from (20.48) or (20.49) that 


L 


dz 

— = ni. 


(20.50) 


(ii) The straight lines that make up the countour C3 may be parameterised as follows: 
Csa, z = (1 — t)R + itR for 0 < t < 1 ; 

Cib, z = — sR T 1(1 — s)R for 0 < s < 1. 

With these parameterisations the required integrals may be written 
r dz _ I" 1 - R + iR , r l -R-iR 

J C3 z 


dt -\- 


ds. 


(20.51) 


0 R + t( — R + iR) Jo iR + s( — R — iR) 

If we could take over from real-variable theory that, for real f, f ( a+bt ) _1 dt = b \n(a+bt ) 
even if a and b are complex, then these integrals could be evaluated immediately. However, 
to do this would be presuming to some extent what we wish to show, and so the evaluation 
must be made in terms of entirely real integrals. For example, the first is 

1 D 1 :u rl (-1 + 0(1 -t- it) 


l 


-R + iR 


0 R( 1 — t) T itR 


dt = 


(1 - f) 2 + f 2 
2f — 1 


dt 


dt T i I 
1 — 2f + 2f* j 0 


1 


1 - 2t + 2t 2 


dt 


ln(l — 2t + 2t 2 ) 


+ 


2 tan 


t — 

T~ 

2 


: 0 + - — (--) 

2 12 V 2J\ 


ni 

y 


The second integral on the right of (20.51) can also be shown to have the value ni/2. Thus 

L 


dz 

— = ni. ◄ 


Considering the results of the last two examples, which have common inte- 
grands and limits, some interesting observations are possible. Firstly, the two 
integrals from z = R to z = — R, along C 2 and C 3 respectively, have the same 
value even though the paths taken are different. It also follows that if we took a 
closed path C 4 , given by C 2 from R to —R and C 3 traversed backwards from —R 
to R , then the integral round C 4 of z -1 would be zero (both parts contributing 
equal and opposite amounts). This is to be compared with result (20.49), in which 
closed path Ci, beginning and ending at the same place as C 4 , yields a value 2ni. 
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It is not true, however, that the integrals along the paths C2 and C 3 are equal 
for any function /(z), or, indeed, that their values are independent of R in general. 


► Evaluate the complex integral of f(z) = Rez along the paths Ci, C 2 and C 3 shown in 
figure 20.13. 


(i) If we take f(z) = Rez and the contour C 1 then 


/ R ezdz= R cos t{—R sin t + iR cost) dt = inR 2 . 

J c, Jo 


(ii) Using C 2 as the contour, 


f Rez dz = f R cos t(—R sin t + iR cos t) dt = ^inR 2 . 

Jc 2 Jo 


(iii) Finally the integral along C 3 = Ci a + Cib is given by 


/ Rez dz = 
Jc 3 Jo 


Rez dz = / (1 — t)R(— R + iR) dt + / (—sR)(—R — iR)ds 


= \R \- 1 + i) + \Rf 1 + i) = iR - < 


The results of this section demonstrate that the value of an integral between 
the same two points may depend upon the path that is taken between them but, 
at the same time, suggest that under some circumstances it is independent of the 
path. The general situation is summarised in the result of the next section, namely 
Cauchy’s theorem, which is the cornerstone of the integral calculus of complex 
variables. 

Before discussing Cauchy’s theorem, however, we note an important result 
concerning complex integrals that will be of some use later. Let us consider the 
integral of a function f(z) along some path C. If M is an upper bound on the 
value of |/(z)| on the path, i.e. |/(z)| < M on C, and L is the length of the path C, 
then 


/(z) dz 


< / \f(z)\\dz\ < M / dl = ML. 


(20.52) 


It is straightforward to verify that this result does indeed hold for the complex 
integrals considered earlier in this section. 


20.11 Cauchy’s theorem 

Cauchy’s theorem states that if f(z) is an analytic function, and f'(z) is continuous 
at each point within and on a closed contour C, then 


/(z) dz = 0. 


(20.53) 


In this statement and from now on we denote an integral around a closed contour 

b y § c - 
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To prove this theorem we will need the two-dimensional form of the divergence 
theorem, known as Green’s theorem in a plane (see section 11.3). This says that 
if p and q are two functions with continuous first derivatives within and on a 
closed contour C (bounding a domain R) in the xy-plane, then 

J C J + y dxdy = (p dy — q dx). (20.54) 

With f(z) = u + iv and dz = dx + i dy, this can be applied to 


to give 


/ = ® f(z) dz = ® (u dx 


v dy) + i 


j f (v dx + u dy) 


I = 



d(-n) d(-n) 

dy dx 


dx dy + i 



du 

dx 


dx dy. 


(20.55) 


Now, recalling that f(z ) is analytic and therefore that the Cauchy-Riemann 
relations (20.5) apply, we see that each integrand is identically zero and thus / is 
also zero; this proves Cauchy’s theorem. 

In fact the conditions of the above proof are more stringent than are needed. 
The continuity of f'(z) is not necessary for the proof of Cauchy’s theorem, 
analyticity of /(z) within and on C being sufficient. However, the proof then 
becomes more complicated and is too long to be given here.f 

The connection between Cauchy’s theorem and the zero value of the integral 
of z _1 around the composite path C 4 discussed towards the end of the previous 
section is apparent : the function z _1 is analytic in the two regions of the z-plane 
enclosed by contours ( C 2 and C 3fl ) and (C 2 and Cy,). 


► Suppose two points A and B in the complex plane are joined by two different paths C\ 
and C 2 . Show that if f{z) is an analytic function on each path and in the region enclosed 
by the two paths then the integral of f(z) is the same along Ci and Ci. 


The situation is shown in figure 20.14. Since f{z) is analytic in R it follows from Cauchy’s 
theorem that we have 


f(z) dz- f (z) dz = 


I Ci 


>c 2 


f(z) dz = 0, 


C1-C2 


since Ci — Ci forms a closed contour enclosing R. Thus we immediately obtain 


[ f(z) dz = f f(z) dz, 
JCi Jc 2 


and so the values of the integrals along Ci and C 2 are equal. ◄ 

An important application of Cauchy’s theorem is in proving that in some cases it 


f The reader may refer to almost any book devoted to complex variables and the theory of functions. 
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Figure 20.14 Two paths C i and C 2 enclosing a region R. 


is possible to deform a closed contour C into another contour y, in such a way that 
the integrals of a function /(z) around each of the contours have the same value. 


► Consider two closed contours C and y in the Argand diagram, y being sufficiently small 
that it lies completely within C. Show that if the function f(z) is analytic in the region 
between the two contours then 


f(z) dz = if(z) dz. 


(20.56) 


To prove this result we consider a contour as shown in figure 20.15. The two close parallel 
lines Ci and C 2 join y and C, which are ‘cut’ to accommodate them. The new contour T 
so formed consists of C, Ci, y and C 2 . 

Within the area bounded by F the function f(z) is analytic and therefore, by Cauchy’s 
theorem (20.53), 


f(z) dz = 0. 


(20.57) 


Now the parts Ci and C 2 of T are traversed in opposite directions, and in the limit lie on 
top of each other, and so their contributions to (20.57) cancel. Thus 


f(z) dz + jf(z) dz = 0. (20.58) 

The sense of the integral round y is opposite to the conventional (anticlockwise) one, and 
so by traversing y in the usual sense, we establish the result (20.56). ◄ 

A sort of converse of Cauchy’s theorem is known as Morera’s theorem, which 
states that if f(z) is a continuous function of z in a closed domain R bounded by 
a curve C and, further, § c f(z) dz = 0, then /(z) is analytic in R. 
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x 

Figure 20.15 The contour used to prove the result (20.56). 


20.12 Cauchy’s integral formula 

Another very important theorem in the theory of complex variables is Cauchy’s 
integral formula, which states that if /(z) is analytic within and on a closed 
contour C and zq is a point within C then 

f(z o) — ~ dz. (20.59) 

2ni J c z - z 0 


This formula is saying that the value of an analytic function anywhere inside 
a closed contour is uniquely determined by its values on the contourf and that 
the specific expression (20.59) can be given for the value at the interior point. 

We may prove Cauchy’s integral formula by using (20.56) and taking y to be a 
circle centred on the point z = z 0 , of small enough radius p that it all lies inside 
C. Then, since /(z) is analytic inside C, the integrand f(z)/(z — zq) is analytic 
in the space between C and y. Thus, from (20.56), the integral around y has the 
same value as that around C. 

We then use the fact that any point z on y is given by z = zo + p exp iO (and 
so dz = ip exp id d6). Thus the value of the integral around y is given by 


/ = 


m 

y Z—ZO 


p2n 


dz = 


f(z 0 + p exp id ) . 


p exp id 


ip exp id dd 


/>2n 

= i / /(zo + p exp i'0) dO. 

Jo 


f The similarity between this and the uniqueness theorem for Dirichlet boundary conditions (see 
chapter 18) is apparent. 
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If the radius of the circle y is now shrunk to zero, i.e. p — » 0, then I — > 2nif(zo), 
thus establishing the result (20.59). 

An extension to Cauchy’s integral formula can be made, yielding an integral 
expression for /'(z 0 ): 




m 

c( z ~ z o ) 2 


dz, 


under the same conditions as previously stated. 


(20.60) 


►Prore Cauchy's integral formula for f{zo) given in (20.60). 

To show this, we use the definition of a derivative and (20.59) itself to evaluate 


t ,, , f(zo + h)-f(z 0 ) 

/(zo) = i™o h 


= lim 

h^0 


= lint 

*-» o 


1 


1 


-L/w 

2ni f c h \z — zo — h z — zo 

J-/ M 

2m J c (. z-zq - h)(z — zo) 

/( Z ) r 

dz. 


dz 


dz 


1 

2 ni f c (z - z 0 ) 2 
which establishes the result (20.60). ◄ 

Further, it may be proved by induction that the nth derivative of f(z) is also 
given by a Cauchy integral. 


/'"’(zo) 


n\ f f(z) dz 
2ni J c (z - r 0 )" +1 ’ 


(20.61) 


Thus, if the value of the analytic function is known on C then not only may the 
value of the function at any interior point be calculated, but also the values of 
all its derivatives. 

The observant reader will notice that (20.61) may also be obtained by the 
formal device of differentiating under the integral sign with respect to zo in 
Cauchy’s integral formula (20.59), 


/‘"’(z o) 


j_ / r m 

2ni Jc dz% [(z - z 0 ) 
n\ f f(z) dz 
2ni Jc (z - Zo )" +1 ’ 


dz 
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► Suppose that f(z) is analytic inside and on a circle C of radius R centred on the point 
z = zq. lf\f(z)\ < M on the circle, where M is some constant, show that 


l/ ( " , (20)| < 


Mn\ 

1^' 


(20.62) 


From (20.61) we have 


1^)1 = Yn 


f(z) dz 


c (z~z 0 ) n+1 


and using (20.52) this becomes 


l/ <n) (A>)l < 


n! M 
2n R" +1 


2n R = 


Mn ! 
R“ 


This result is known as Cauchy’s inequality. ◄ 


We may use Cauchy’s inequality to prove Liouville’s theorem , which states that 
if /(z) is analytic and bounded for all z then / is a constant. Setting n = 1 in 
(20.62) and letting R —> oo we find |/'(z 0 )| = 0 and hence /'(z 0 ) = 0. Since /(z) is 
analytic for all z we may take zo as any point in the z-plane and thus /'(z) = 0 
for all z; this implies /(z) = constant. Liouville’s theorem may be used in turn to 
prove the fundamental theorem of algebra (see exercise 20.12). 


20.13 Taylor and Laurent series 

Following on from (20.61), we may establish Taylor’s theorem for functions of a 
complex variable. If /(z) is analytic inside and on a circle C of radius R centred 
on the point z = zo, and z is a point inside C, then 

OO 

/(z) = 5>(z-z 0 r, (20.63) 

12=0 


where a„ is given by / ( ”*(zo)/n!. The Taylor expansion is valid inside the region 
of analyticity and, for any particular zo, can be shown to be unique. 

To prove Taylors theorem (20.63), we note that, since /(z) is analytic inside 
and on C, we may use Cauchy’s formula to write /(z) as 

<m64 > 


where c; lies on C. Now we may expand the factor (£ — z) 1 as a geometric series 
in (z - z 0 )/(£ - z 0 ). 


1 

£ -z 
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so (20.64) becomes 


m = 






1 

2ni 


X^ z ~ z o)' 

n = 0 


m 

(Z - ^o)" +1 




1 

2ni 


( z - z o)' 

n=0 


2nifW(z 0 ) 


(20.65) 


where we have used Cauchy's integral formula (20.61) for the derivatives of 
/(z). Cancelling the factors of 2ni, we thus establish the result (20.63) with 
a n = / (n) (z 0 )/n!. 


► S/iow that if f(z) and g{z) are analytic in some region R, and f(z ) = g(z) within some 
subregion S of R, then f(z ) = g(z) throughout R. 


It is simpler to consider the (analytic) function h(z) = /(z) — g(z), and to show that because 
h(z) = 0 in S it follows that h{z) = 0 throughout R. 

If we choose a point z = z 0 in S then we can expand h(z) in a Taylor series about zo, 

h(z) = h(z 0 ) + h'(z 0 )(z - z 0 ) + sh"(z 0 )(z - z 0 ) 2 H , 

which will converge inside some circle C that extends at least as far as the nearest part of 
the boundary of R, since h(z) is analytic in R. But since zo lies in S, we have 

h(z 0 ) = h'(z 0 ) = h"(z 0 ) = • • • = 0, 

and so h(z) = 0 inside C. We may now expand about a new point, which can lie anywhere 
within C, and repeat the process. By continuing this procedure we may show that h(z) = 0 
throughout R. 

This result is called the identity theorem and, in fact, the equality of f(z) and g(z) 
throughout R follows from their equality along as little as some curve in R , or even at a 
countably infinite number of points in R. ◄ 

So far we have assumed that f(z) is analytic inside and on the (circular) 
contour C. If, however, f(z ) has a singularity inside C at the point z = zo, then it 
cannot be expanded in a Taylor series. Nevertheless, suppose that /(z) has a pole 
of order p at z = zo but is analytic at every other point inside and on C. Then 
the function g(z) = (z — zo) p /(z) is analytic at z = zo, and so may be expanded 
as a Taylor series about z = zq, 


g(z) = b "( z ~ z °)"- (20.66) 

n = 0 


Thus, for all z inside C, /(z) will have a power series representation of the form 


m = 


a- p 

(2 - 2 0 )P 


a - 1 

z — z 0 


+ «o + ai(z — z 0 ) + a 2 (z — z 0 ) 2 H , 

(20.67) 


with a_ p f 0. Such a series, which is an extension of the Taylor expansion, is 
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Figure 20.16 The region of convergence R for a Laurent series of f(z) about 
a point z = z 0 where f(z) has a singularity. 


called a Laurent series. By comparing the coefficients in (20.66) and (20.67), we 
see that a n = b n+p . Now, the coefficients b n in the Taylor expansion of g(z) are 
seen from (20.65) to be given by 


= g ( %o) = J_ 

n ! 2ni 


g(z) 

(z - z 0 )" +1 


dz. 


and so for the coefficients a„ in (20.67) we have 


1 l g( z ) 1 I f( z ) 

Cl " 2ni J (z - z 0 )" +1+p 2ni f (z - z 0 ) n+1 

an expression that is valid for both positive and negative n. 

The terms in the Laurent series with n > 0 are collectively called the analytic 
part, whilst the remainder of the series, consisting of terms in inverse powers of 
z — zo, is called the principal part. Depending on the nature of the point z = zo, 
the principal part may contain an infinite number of terms, so that 


+00 

f(z)= ^ a n (z-z 0 ) n . (20.68) 

n =— oo 


In this case we would expect the principal part to converge only for |(z — zo) _1 | 
less than some constant, i.e. outside some circle centred on zo- However, the 
analytic part will converge inside some (different) circle also centred on zo- If the 
latter circle has the greater radius then the Laurent series will converge in the 
region R between the two circles (see figure 20.16); otherwise it does not converge 
at all. 

In fact, it may be shown that any function /(z) that is analytic in a region 
R between two such circles C\ and Ci centred on z = z 0 can be expressed as 
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a Laurent series about zo that converges in R. We note that, depending on the 
nature of the point z = zo, the inner circle may be a point (when the principal 
part contains only a finite number of terms) and the outer circle may have an 
infinite radius. 

We may use the Laurent series of a function /(z) about any point z = zo to 
classify the nature of that point. If /(z) is actually analytic at z = z 0 then in 
(20.68) all a„ for n < 0 must be zero. It may happen that not only are all a n 
zero for n < 0 but ao, a i, ..., a m _i are all zero as well. In this case the first 
non-vanishing term in (20.68) is a m (z — z 0 ) m with m > 0, and /(z) is then said to 
have a zero of order m at z = zq. 

If /(z) is not analytic at z = zq then two cases arise, as discussed above (p is 
here taken as positive): 


(i) It is possible to find an integer p such that a_ p f 0 but a_ p _^ = 0 for all 
integers k > 0; 

(ii) it is not possible to find such a lowest value of —p. 

In case (i), f(z) is of the form (20.67) and is described as having a pole of order 
p at z = zo; the value of a_i (not a_ p ) is called the residue of /(z) at the pole 
z = zo, and will play an important part in later applications. 

For case (ii), in which the negatively decreasing powers of z — zo do not 
terminate, /(z) is said to have an essential singularity. These definitions should be 
compared with those given in section 20.6. 


►Find the Laurent series of 


m = 


i 

z(z — 2) 3 


about the singularities z = 0 and z = 2 ( separately ). Hence verify that z = 0 is a pole of 
order 1 and z =2 is a pole of order 3, and find the residue of f(z) at each pole. 


To obtain the Laurent series about z = 0, we simply write 


/(*) = - 


8z(l 
1 

8z 


■ ~/ 2) 3 


1 + 


h»K) 


( — 3 )( — 4) 
2 ! 




1 3 3z 5z 2 

8z _ 16 ~ 16 32~ 


Since the lowest power of z is —1, the point z = 0 is a pole of order 1. The residue of /(z) 
at z = 0 is simply the coefficient of z _1 in the Laurent expansion about that point and is 
equal to —1/8. 

The Laurent series about z = 2 is most easily found by letting z = 2 + £ (or z — 2 = £) 
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and substituting into the expression for f(z) to obtain 


(2 + 2{3(1 + C/2) 



1 1 1 1 £ 

2^3~4F + 8/ _ T6 + 32 


1 1 11 z — 2 

~ 2 (z ~2) 3 ~ 4(z - 2) 2 + 8(z - 2) ~ 16 + ~ll2 ' 

From this series we see that z = 2 is a pole of order 3 and that the residue of f(z) at z = 2 
is 1/8. ◄ 

As we shall see in the next few sections, finding the residue of a function 
at a singularity is of crucial importance in the evaluation of complex integrals. 
Specifically, formulae exist for calculating the residue of a function at a particular 
(singular) point z = zo without having to expand the function explicitly as a 
Laurent series about zo and identify the coefficient of (z — zo) -1 . The type of 
formula generally depends on the nature of the singularity at which the residue 
is required. 


► Suppose that f(z) has a pole of order m at the point z = zg. By considering the Laurent 
series of f(z ) about zg, derive a general expression for the residue R(zg) of f(z ) at z = zg. 
Hence evaluate the residue of the function 


at the point z = i. 


m = 


exp iz 
(z 2 + l) 2 


If /(z) has a pole of order m at z = zg then its Laurent series about this point has the 
form 

f( z ) = 7 ~~\m "*■ + 7 + a 0 + a l( z — z o) + fll(z — Zo) 2 H , 

(Z Zg) m ( Z-Zg ) 

which, on multiplying both sides of the equation by (z — z 0 ) m , gives 

(z - zg) m f (z ) = a-m + a_ m+ i(z — z 0 ) H b O-l (z - zo)”- 1 H . 

Differentiating both sides m — 1 times, we obtain 

,71)1—1 00 

[(z - zg) m f(z)] = (m - 1) ! h n( z “ z 0 )". 

n= 1 

for some coefficients b„. In the limit z — > zg, however, the terms in the sum disappear and 
after rearranging we obtain the formula 

f 1 (7 m_1 ) 

R(z 0 ) = = lim l — r [(z - z 0 ) m /(z)] \ , (20.69) 

( (m — 1) ! dz m 1 J 

which gives the value of the residue of /(z) at the point z = z 0 . 

If we now consider the function 

exp iz exp iz 

(z 2 + l) 2 (z + i) 2 (z — i) 2 ’ 
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we see immediately that it has poles of order 2 ( double poles) at z = i and z = —i. To 
calculate the residue at (for example) z = i, we may apply the formula (20.69) with m = 2. 
Performing the required differentiation we obtain 


^[(z-0 2 /(z)] = ^ 


exp iz 


(z + i) 2 


1 , 

- — — j [(z + i) i exp iz - 2(exp iz)(z + i)]. 
(z + 0 4 


Setting z = i we find the residue is given by 

1 1 


m = 


1! 16 


— 4ie _1 — 4z'e -1 ) =-- 


2e 


An important special case of (20.69) occurs when /(z) has a simple pole (a pole 
of order 1) at z = zp. Then the residue at zp is given by 


R(z 0 ) = lim [(z - z 0 )/(z)] . 

Z->ZQ 


(20.70) 


If /(z) has a simple pole at z = zo and, as is often the case, has the form 
g(z)//t(z), where g(z) is analytic and non-zero at z 0 and /t(z 0 ) = 0, then (20.70) 
becomes 


R(z 0 ) — hm 

0 


(z- 


-Zp)g(z) 

h(z) 

1 


= g(a>) fiX m 


= gUo) lira 

z^>z 0 n(z ) 

g(Zp) 

h'(zp)’ 


(20.71) 


where we have used 1’HopitaFs rule. This result often provides the simplest way 
of determining the residue at a simple pole. 


20.14 Residue theorem 

Having seen from Cauchy's theorem that the value of an integral round a closed 
contour C is zero if the integrand is analytic inside the contour, it is natural to 
ask what value it takes when the integrand is not analytic inside C. The answer 
to this is contained in the residue theorem, which we now discuss. 

Suppose the function f(z) has a pole of order m at the point z = zp, and so 
can be written as a Laurent series about zp of the form 

OO 

/(z)= ]T a n (z-z 0 y. (20.72) 

n=—m 

Now consider the integral / of /(z) around a closed contour C that encloses 
z = zp, but no other singular points. Using Cauchy’s theorem this integral has 
the same value as the integral around a circle y of radius p centred on z = zo, 
since /(z) is analytic in the region between C and y. On the circle we have 


752 



20.14 RESIDUE THEOREM 


z = zo + p exp iO (and dz = ip exp iO d6), and so 
/ = <ff(z ) dz 


~ a >‘ f( z ~ z o)" dz 

n=—m 

00 r2n 

= V a n / ip' 1+l exp[/(n + 1)0] dO. 
Jo 


For every term in the series with n ^ —1, we have 


pin 

/ ip n+1 exp [i(n + 1)0] dO = 
Jo 


ip n+l exp[/(n + 1)0] 


In 


i(n + 1) 


= 0 , 


J o 


but for the n = — 1 term we obtain 


p2n 


/ i dd = 2ni. 

Jo 

Therefore only the term in (z — zo) -1 contributes to the value of the integral 
around y (and therefore C), and / takes the value 


I = j) /(z) dz = 2nia~\. (20.73) 

Thus the integral around any closed contour containing a single pole of general 
order m (or, by extension, an essential singularity) is equal to 2ni times the residue 
of f(z) at z = zo. 

If we extend the above argument to the case where /(z) is continuous within 
and on a closed contour C and analytic, except for a finite number of poles, 
within C, then we arrive at the residue theorem 


cp f(z) dz = 2ni'S^Rj, (20.74) 

J C 

where J2j Rj i s the sum of the residues of /(z) at its poles within C. 

The method of proof is indicated by figure 20.17, in which (a) shows the original 
contour C referred to in (20.74) and ( b ) shows a contour C' giving the same value 
to the integral, because / is analytic between C and C . Now the contribution to 
the C integral from the polygon (a triangle for the case illustrated) joining the 
small circles is zero, since / is also analytic inside C' . Flence the whole value of 
the integral comes from the circles and, by result (20.73), each of these contributes 
2ni times the residue at the pole it encloses. All the circles are traversed in their 
positive sense if C is thus traversed and so the residue theorem follows. Formally, 
Cauchy's theorem (20.53) is a particular case of (20.74) in which C encloses no 
poles. 

Finally we mention another important result, which we will use later. Suppose 
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Figure 20.17 The contours used to prove the residue theorem: (a) the original 
contour; ( b ) the contracted contour encircling each of the poles. 


that /(z) has a simple pole at z = zo and so may be expanded as the Laurent 
series 

/(z) = 4>(z) + fl_i(z - Zo)- 1 , 

where cj)(z) is analytic within some neighbourhood surrounding zq. We wish to 
find an expression for the integral I of /(z) along an open contour C, which is 
the arc of a circle of radius p centred on z = zo given by 

|z - z 0 | = p, < arg(z - z 0 ) < 0 2 , (20.75) 

where p is chosen small enough that no singularity of /, other than z = zo, lies 
within the circle. Then / is given by 

I = / f(z)dz= / (/>(z) dz + a_i / (z — zo) -1 dz. 

Jc Jc Jc 

If the radius of the arc C is now allowed to tend to zero then the first integral 
tends to zero, since the path becomes of zero length and </> is analytic and 
therefore continuous along it. On C, z = pe' 6 and hence the required expression 
for / is 

I = lim / /(z) dz = lim ( a~\ [ -^ipe ,e dd\ = ia_i (62 ~ d| ). (20.76) 

p~*°Jc p^°\ A pe w J 

We note that result (20.73) is a special case of (20.76) in which 0 2 is equal to 
01 + 271. 


20.15 Location of zeroes 

An important use of the residue theorem is to locate the zeroes of functions 
of a complex variable. The location of such zeroes has a particular application 
in electrical network and general oscillation theory, since the complex zeroes of 
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certain functions give the system parameters (usually frequencies) at which system 
instabilities occur. As the basis of a method for locating these zeroes we next 
prove three important theorems. 

(i) If f(z) has poles as its only singularities inside a closed contour C and is 
not zero at any point on C then 

£ dz = IniYjLNj - Pj). (20.77) 

Here Nj is the order of the jth zero of f(z) enclosed by C. Similarly Pj is the 
order of the yth pole of f(z) inside C. 

To prove this we note that, at each position zj, f (z ) can be written as 

/(z) = (z-z,-r>(z), (20.78) 


where <f>(z) is analytic and non-zero at z = zj and nij is positive for a zero and 
negative for a pole. Then the integrand f'(z)/f(z) takes the form 


f'(z) = m i , <fr'U) 

/(z) 2 - Zj f(z ) ' 


(20.79) 


Since ( p(zj ) f 0, the second term on the right is analytic; thus the integrand 
has a simple pole at z = zj, with residue mj. For zeroes m 7 - = Nj and for poles 
nij = —Pj, and thus by the residue theorem (20.77) follows. 

(ii) If f(z) is analytic inside C and not zero at any point on it then 


27r^/V ; = A c [arg/(z)], (20.80) 

i 


where A c [x] denotes the variation in x around the contour C. 

Since / is analytic there are no Pj ; further, since 

® = £[Ln f(z)\, (20.81) 

equation (20.77) can be written 

2n iJ2 N j = £ dz = A c [Ln/(z)]. (20.82) 

However, 


A c [Ln/(z)] = A c [ln|/(z)|] + ;A c [arg/(z)], (20.83) 

and, since C is a closed contour, ln|/(z)| must return to its original value and 
so the real term on the RHS is zero. Comparison of (20.82) and (20.83) then 
establishes (20.80), which is known as the principle of the argument. 

(iii) If /(z) and g(z) are analytic within and on a closed contour C and 
|g(z)| < |/(z)| on C then /(z) and /(z) + g(z) have the same number of zeroes 
inside C; this is Rouche’s theorem. 
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With the conditions given, neither f(z) nor f(z) + g(z) can have a zero on C. 
So, applying theorem (ii) with an obvious notation, 

2n T,j N Af + g) = A c [arg(/ + g)] 

= A c [arg/] + A c [arg(l + g//)] 

= 2tt £,JV,(/) + A c [arg(l + g//)]. (20.84) 

Further, since |g| < |/| on C, 1 + g/f always lies within a unit circle centred 
on z = 1; thus its argument always lies in the range — 7t/2 < arg(l + g/f) <n/2 
and cannot change by any multiple of 2n. It must therefore return to its 
original value when z returns to its starting point having traversed C. Hence 
the second term on the right of (20.84) is zero and the theorem is estab- 
lished. 

The importance of Rouche s theorem is that for some functions, in particular 
polynomials, only the behaviour of a single term in the function need be con- 
sidered if the contour is chosen appropriately. For example, for a polynomial, 
treated as /(z) + g(z), only the properties of its largest- (smallest-) power, taken as 
/(z), need be investigated, if a circular contour is chosen with radius R sufficiently 
large (small) that, on the contour, the magnitude of the largest (smallest) power 
term is greater than the sum of the magnitudes of all other terms. Further, if the 
zeroes of /(z) + g(z) = are considered as the roots of /(z) + g(z) = 0, 

written in the form 

1 + 77W = °> (20.85) 

/(z) 

then it is apparent that no roots can lie outside (inside) |z| = R and also that 
/(z) = b^z N (or ho) has N (or 0) zeroes inside |z| = R; / + g consequently has 
the same number of zeroes inside the same circle. 

A weak form of the maximum-modulus theorem may also be deduced. This 
states that if /(z) is analytic within and on a simple closed contour C then |/(z)| 
attains its maximum value on the boundary of C. 

Let |/(z)| < M on C with equality at at least one point of C. Now suppose 
that there is a point z = a inside C such that \f(a)\ > M. Then the function 
h(z) = f(a) is such that \h(z)\ > \ — /(z)| on C, and thus h(z) and /i(z) — /(z) have 
the same number of zeroes inside C. But /?(z) (= f(a )) has no zeroes inside C 
and by Rouche’s theorem this would imply that f(a) — /(z) has no zeroes in C. 
However, f(a) — /(z) clearly has a zero at z = a, and so we have a contradiction; 
the assumption of a point z = a inside C such that \f(a)\ > M must be invalid. 
This establishes the theorem. 

The stronger form of the maximum-modulus theorem, which we do not prove, 
states in addition that the maximum value of /(z ) is not attained at any interior 
point except for the case where /(z ) is a constant. 
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Figure 20.18 A contour for locating the zeroes of a polynomial that occur 
in the first quadrant of the Argand diagram. 


► Show that the four zeroes of h(z) = z 4 + z + 1 occur one in each quadrant of the Argand 
diagram and that all four lie between the circles \z\ = 2/3 and \z\ = 3/2. 


Putting z = x and z = iy shows that no zeroes occur on the real or imaginary axes. They 
must therefore occur in conjugate pairs (as is shown by taking the complex conjugate of 
h(z) = 0). 

Now take C as the contour OXY O shown in figure 20.18 and consider the changes 
A[arg/i] in the argument of h(z) as z traverses C. 

(i) OX : arg h is everywhere zero, since h is real, and thus A 0 _y [arg h] = 0. 

(ii) XY : z = RexpiO and so arg h changes by an amount 

A xy [arg h] = A XY [argz 4 ] + A. Yy [arg(l + z~ 3 + z~ 4 )] 

= A Y y [argR 4 e 4,,i ] + A Y y {arg[l + O (R 3 )]} 

= 27i + 0(R~ 3 ). (20.86) 

(iii) Y 0\ z = iy and so arg h = y/(y 4 + 1), which starts at 0(R~ 3 ) and finishes at 0 as y 
goes from large R to 0. It never reaches n/2 because y 4 + 1 = 0 has no real positive 
root. Thus A yo [arg/i] = 0. 

Flence for the complete contour A c [arg/i] = 0 + 2n + 0 + 0(R~ 3 ) and, if R is allowed 
to tend to infinity, we deduce from (20.80) that /i(z) has one zero in the first quadrant. 
Furthermore, since the roots occur in conjugate pairs, a second root must lie in the fourth 
quadrant and the other pair in the second and third quadrants. 

To show that the zeroes lie within a given annulus in the z-plane we must apply Rouche’s 
theorem, as follows. 

(i) With C as |z| = 3/2, f= z 4 ,g=z + 1. Now |/| = 81/16 on C and |g| < 1 + |z| < 
5/2 < 81/16. Thus since z 4 = 0 has four roots inside |z| = 3/2, so also does 
z 4 +z + l =0. 

(ii) With C as |z| = 2/3, / = 1, g = z 4 + z. Now / = 1 on C and |g| < |z 4 | + |z| = 
16/81 + 2/3 = 70/81 < 1. Thus since f = 0 has no roots inside Izl = 2/3, neither 
does 1 + z + z 4 = 0. 

Flence the four zeroes of h{z) = z 4 + z + 1 occur one in each quadrant and all lie between 
the circles |z| = 2/3 and |z| = 3/2. ◄ 


757 




COMPLEX VARIABLES 


A further technique useful in locating function zeroes is explained in exer- 
cise 20.16. 


20.16 Integrals of sinusoidal functions 

The remainder of this chapter is devoted to methods of applying contour inte- 
gration and the residue theorem to various types of definite integral. In each case 
not much preamble is given since, for this material, the simplest explanation is 
felt to be via a series of worked examples that can be used as models. 

Suppose that an integral of the form 

F(cos 6, sin 6) d6 (20.87) 

is to be evaluated. It can be made into a contour integral around the unit circle 
C by writing z = exp id and hence 

cos 6 = j(z + z _1 ), sin0 = — ji(z — z _1 ), dd = —iz~ l dz. (20.88) 

This contour integral can then be evaluated using the residue theorem, provided 
the transformed integrand has only a finite number of poles inside the unit circle 
and none on it. 



► Evaluate 


I = 


cos 2 6 


a 2 + b 2 — lab cos 8 


dd , 


b > a > 0. 


(20.89) 


By de Moivre’s theorem (section 3.4), 

cos n9 = i(z" + z _n ). (20.90) 

Using n = 2 in (20.90) and straightforward substitution for the other functions of 8 in 
(20.89) gives 

/ = -*-/_ *±1 *. 

lab J c z 2 (z — a/b)(z — b/a) 


Thus there are two poles inside C, a double pole at z = 0 and a simple pole at z = a/b 
(recall that b > a). 

We could find the residue of the integrand at z = 0 by expanding the integrand as a 
Laurent series in z and identifying the coefficient of z _1 . Alternatively, we may use the 
formula (20.69) with m = 1. Denoting the integrand by /(z) we have 


d 

dz 


[z 2 /(z)l 


d z 4 + 1 

dz (z — a/b)(z — b/a) 

(z — a/b)(z — b/a)4z i — (z 4 + 1 )[(z — a/b) + (z — b/a)] 
(z — a/b) 2 {z — b/a) 2 


Setting z 


0 and applying (20.69), we find 
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Figure 20.19 A semicircular contour in the upper half-plane. 


For the simple pole at z = a/b , equation (20.70) gives the residue as 


R(a/b)= lim [(z-a/b)f(z)] 

2-^a/b) 

a 4 + b 4 
ab(b 2 — a 2 ) 


(a/b) 4 + 1 
( a/b) 2 (a/b — b/a) 


Therefore by the residue theorem 


/ = 2ni x 


lab 


' + b 2 


ab 


a 4 + b 4 ' 

ab(b 2 — a 2 ) 


Ina 2 

b 2 (b 2 — a 2 )' ^ 


20.17 Some infinite integrals 


Suppose we wish to evaluate an integral of the form 

/ OO 

fix) dx, 

-OO 

where f(z) has the following properties. 


(i) f(z) is analytic in the upper half-plane, Imz > 0, except for a finite number 
of poles, none of which are on the real axis. 

(ii) on a semicircle T of radius R (figure 20.19), R times the maximum of |/| 
on r tends to zero as R — * oo (a sufficient condition is that zf(z) — > 0 as 
\z\ co). 

(iii) j { \ /: f(x) dx and / 0 °°/(x) dx both exist. 


The required integral is then given by 

/ OO 

f(x) dx = 2ni x (sum of the residues at poles with Imz >0). 

-00 


(20.91) 
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Since 

J f(z) dz 

< 2nR x (maximum of / on T), 

condition (ii) ensures that the integral along T tends to zero as R — * oo, after 
which (20.91) is obvious from the residue theorem. 

► Evaluate 


r 00 dx 

— r r- 7 , where a is real. 

3 (x 2 + a 2 ) 4 


The complex function (z 2 + a 2 )~ 4 has poles of order 4 at z = +ai of which only z = ai 
is in the upper half-plane. Conditions (ii) and (iii) are clearly satisfied. For higher-order 
poles, formula (20.69) for evaluating residues can be tiresome to apply. So , instead, we put 
z = ai + £ and expand for small £ to obtain f 

1 = 1 =^fi-^y 4 . 

(z 2 + a 2 ) 4 (2ai^ + £ 2 ) 4 (2 m'£) 4 \ 2a J 

The coefficient of is 

1 (— 4)(— 5)(— 6) /-A 3 = -5 i_ 

(2 a) 4 3! \2a) 32a 7 ’ 

and hence by the residue theorem 

dx _ IOti 
J-oo(x 2 + a 2 ) 4 32a 7 ’ 

and so I = 5n/(32a 7 ). ◄ 


Condition (i) of the previous method required there to be no poles of the 
integrand on the real axis, but in fact simple poles on the real axis can be 
accommodated by indenting the contour as shown in figure 20.20. The indentation 
at the pole z = zq is in the form of a semicircle y of radius p in the upper half- 
plane, thus excluding the pole from the interior of the contour. 

What is then obtained from a contour integration, apart from the contributions 
for T and y, is called the principal value of the integral, defined as p —*■ 0 by: 

/ R rzo—p rR 

f(x) dx = / f(x) dx + / f(x) dx. 

R J—R J zo+p 

The remainder of the calculation goes through as before, but the contribution 
from the semicircle y must be included. Result (20.76) of section 20.14 shows that 
since only a simple pole is involved its contribution is 


— ici— {7i, 


(20.92) 


where a_i is the residue at the pole and the minus sign arises because y is 
traversed in the clockwise (negative) sense. 


f This illustrates another useful technique for determining residues. 
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Figure 20.20 An indented contour used when the integrand has a simple 
pole on the real axis. 


We defer giving an example of an indented contour until we have established 
Jordan’s lemma; we will then work through an example illustrating both. Jordan’s 
lemma enables infinite integrals involving sinusoidal functions to be evaluated. 


Jordan’s lemma. For a function f(z ) of a complex variable z, if 

(i) f(z ) is analytic in the upper half-plane except for a finite number of poles 
in lmz >0, 

(ii) the maximum of\f(z)\ — > 0 as \z\ — * oo in the upper half-plane, 

(iii) m > 0, 

then 

I r = J e imz f(z) dz^ 0 as R — oo, (20.93) 

where T is the same semicircular contour as in figure 20.19. 


Notice that this condition (ii) is less stringent than the earlier condition (ii) 
(see the start of this section), since we now only require M(R ) — > 0 and not 
RM(R) —*■ 0, where M is the maximum! of \f(z)\ on |z| = R. 

The proof of the lemma is straightforward, once it has been observed that, for 
0 < 0 < n/2. 


1 > 


sind 

~T~ 



(20.94) 


Then, since on T we have | exp(;mz)| = | exp(— mR sin d)|. 


Ir < / |e' m 7(z)| \dz\ < MR e 


-mR sin 6 


dO = 2 MR 


r n/2 


-mR sin 6 


do. 


f More strictly the least upper bound. 
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Thus, using (20.94), 

Jr < 2 MR I ‘ e -' nR(2e/7l} dd= — (1 -e- mR ) < — ; 
Jo m m 

hence,as R — » oo, 7 r tends to zero since M does. 


the principal value of 


cos mx 

for a real, m > 0. 

/ dx, 

y~o o x-a 


Consider the function (z — a) -1 exp(z'mz); although it has no poles in the upper half-plane 
it does have a simple pole at z = a, and further |(z — a) _1 | — > 0 as |z| — * oo. We will use a 
contour like that shown in figure 20.20 and apply the residue theorem. Symbolically, 



(20.95) 


Now as R — > oo and p — > 0 we have f r — > 0, by Jordan’s lemma, and from (20.91) and 
(20.92) we obtain 


P 



gimx 

x — a 


dx — iTici—i = 0, 


(20.96) 


where u_i is the residue of (z — a) 1 exp (imz) at z = a, which is exp (ima). Then taking the 
real and imaginary parts of (20.96) gives 


P 

P 



cos mx 
x — a 
sin mx 
x — a 


dx = —k sin ma, 
dx = n cos ma. 


as required, 
as a bonus. ◄ 


20.18 Integrals of multivalued functions 

We have discussed briefly some of the properties and difficulties associated with 
certain multivalued functions such as z 1 / 2 or Lnz. It was mentioned that one 
method of managing such functions is by means of a ‘cut plane’. A similar 
technique can be used with advantage to evaluate some kinds of infinite integral 
involving real functions for which the corresponding complex functions are multi- 
valued. A typical contour employed for functions with a single branch point 
located at the origin is shown in figure 20.21. Here T is a large circle of radius R 
and y a small one of radius p, both centred on the origin. Eventually we will let 
R —* oo and p — * 0. 

The success of the method is due to the fact that because the integrand is 
multivalued, its values along the two lines AB and CD joining z = p to z = R 
are not equal and opposite although both are related to the corresponding real 
integral. Again an example gives the best explanation. 
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Figure 20.21 A typical cut-plane contour for use with multivalued functions 
that have a single branch point located at the origin. 


► Evaluate 



dx 

(.x + a ) 3 .* 1 / 2 ’ 


a > 0 . 


We consider the integrand f(z) = (z + up 3 z~ 1/2 and note that |z/(z)| — > 0 on the two 
circles as p — > 0 and R — > oo. Thus the two circles make no contribution to the contour 
integral. 

The only pole of the integrand inside the contour is at z = —a (and is of order 3). 
To determine its residue we put z = — a + £ and expand (noting that (— a) 1/2 equals 
a 1/2 exp(i7i/2) = ia 1/2 ): 


1 

(z + a) 3 zV 2 


1 


,M t - a l/2(l_ ( S/ a )l/2 

1 


i^fl 1 / 2 


, 1 £ 3 £ 2 

1 + x f g ~2 + ' 

2 a 8 a 2 


The residue is thus — 3//(8a s/2 ). 

The residue theorem (20.74) now gives 

Z» + /r + /„c + /' 2 "(5S)- 

We have seen that f r and f y vanish and if we denote z by x along the line AB then it has 
the value z = x exp 2ni along the line DC (note that exp27i/ must not be set equal to 1 
until after the substitution for z has been made in f DC ). Substituting these expressions, 

r°° dx I" 0 dx 3k 

Jo (x + a) 3 x 1/2 [x exp 2k/ + a] 3 x 1/2 exp( \2ni) 4a 5 D 
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Thus 


and 


1 


dx 


3n 


exp 7i! J J 0 (x + a) 3 x*/ 2 4a 5 / 2 ’ 


_ 1 3n 
1 ~ 2 X 4a 5 / 2 ' 


20.19 Summation of series 

Sometimes a real infinite series may be summed if a suitable complex function 
can be found that has poles on the real axis at positions corresponding to the 
values of the dummy variable in the summation, and whose residues at these 
poles are equal to the values of the terms of the series there. 


►By considering 


71 COt 7 ZZ 


dz. 


c (a + z) 2 

where a is not an integer and C is a circle of large radius, evaluate 

1 


E 


(a + n) 


2 ' 


The integrand has (i) simple poles at z = integer n, for — oo < n < oo, (ii) a double pole at 


(i) To find the residue of cot nz, put z = n + i for small c : 

COS(777I + l^7l) COS !!7T 1 

in' 


cot nz = 


sin(ji7i + in) (cos nn)in 

The residue of the integrand at z = n is thus n(a + n)~ 2 n^ 1 . 

(ii) Putting z = —a + i for small i and determining the coefficient of i~ l f 


7i cot nz n 

__ = _ cot( _ fl3r + C3I > 


n ( 

= — \ cot( an) + i 


—(cot nz) 
dz 


1 


so that the residue at the double pole z = — a is 
7t [—71 cosec 2 7iz] - 

Collecting together these results to express the residue theorem gives 


/ = 


n cot nz 
(a + z) 2 


dz = 2ni 


E 


i 


(a + n) 2 


where N equals the integer part of R. But as the radius R of C tends to oo, cot nz 
(depending on whether Im z is greater or less than zero respectively). Thus 

r dz 

J (fl + s) 2 ’ 


(20.97) 
+i 


I < k 


f This again illustrates one of the techniques for determining residues. 
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which tends to 0 as R — * oo. Thus / — > 0 as R (and hence N) — ► oo and (20.97) establishes 
the result 

1 n 1 2 

(a + n) 2 sin 2 na 

Series with alternating signs in the terms, i.e. (—1)", can also be attempted 
in this way but using cosec nz instead of cot nz, since the former has residue 
(— l)"jt _1 at z = n (see exercise 20.30). 


E 


20.20 Inverse Laplace transform 


As a final example of contour integration we mention a method whereby the 
process of Laplace transformation, discussed in chapter 13, can be inverted. 

It will be recalled that the Laplace transform f(s) of a function f(x), x > 0, is 
given by 

/OO 

/(.s) = / e~ sx f(x) dx, Re s > s 0 . (20.98) 

Jo 


In chapter 13, functions f(x) were deduced from the transforms by means of a 
prepared dictionary. However, an explicit formula for an unknown inverse may 
be written in the form of an integral. It is known as the Bromwich integral and is 
given by 

i /A+ioo 

f(x) = — / e sx f(s) ds, X > 0, (20.99) 

2ni Jx-i W 


where s is treated as a complex variable and the integration is along the line L 
indicated in figure 20.22. The position of the line is dictated by the requirements 
that X is positive and that all singularities of f(s) lie to the left of the line. 

That (20.99) really is the unique inverse of (20.98) is difficult to show for general 
functions and transforms, but the following verification should at least make it 
plausible: 


= 2ni 


pA-\-ico /oo 

/ dse sx / e~ su f(u) du, Re(s) > 0, i.e. X > 0, 
/ A— i'oo J 0 

/•A+ioo 


1 /-oo pA-\-lCO 

= — / duf(u) / e six ~ u) ds 
2niJ 0 oo 

1 /CO /OO 

= — duf(u )/ e lix ~ u) e ip{x ~ u) i dp, putting s = X + ip, 

2m J 0 J-o, 

1 C 00 

= — / f(u)e^ x ~ u ^2nd(x — u) du 
2n Jo 


f(x) x > 0, 
0 x < 0. 
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lms 




L 


Res 


Figure 20.22 The integration path of the inverse Laplace transform is along 
the infinite line L. The quantity X must be positive and large enough for all 
poles of the integrand to lie to the left of L. 


Our main interest here is in the use of contour integration. To employ it to 
evaluate the line integral in (20.99), the path L must be made into a closed 
contour in such a way that the contribution from the completion either vanishes 
or is simply calculable. 

A typical completion is shown in figure 20.23(a) and would be appropriate if 
f(s) had a finite number of poles. For more complicated cases, in which /(s) has 
an infinite sequence of poles but all to the left of L as in figure 20.23 (b), a sequence 
of circular-arc completions that pass between the poles must be used and f(x) is 
obtained as a series. If f(s) is a multivalued function then a cut plane is needed 
and a contour such as that shown in figure 20.23(c) might be appropriate. 

We consider here only the simple case in which the contour in figure 20.23(a) 
is used; we refer the reader to the exercises at the end of the chapter for others. 
Ideally, we would like the contribution to the integral from the circular arc T to 
tend to zero as its radius R — * oo. Using a modified version of Jordan’s lemma, 
it may be shown that this is indeed the case if there exist constants M > 0 and 
a > 0 such that on T 


\m ^ y*- 


Moreover, this condition always holds when f(s) has the form 


m = 


m 

Q(sY 


where P(s) and (2(s) are polynomials and the degree of Q(s) is greater than that 
of P(s). 
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(a) (b) (c) 

Figure 20.23 Some contour completions for the integration path L of the 
inverse Laplace transform. For details of when each is appropriate see the 
main text. 

When the contribution from the part-circle T tends to zero as R — * oo, we 
have from the residue theorem that the inverse Laplace transform (20.99) is given 
simply by 

/(f) = ^ (residues of f(s)e sx at all poles) . (20.101) 


| ►Find the function f(x) whose Laplace transform is 

s 


m = 


-k 2 ’ 


where k is a constant. 


It is clear that f(s) is of the form required for the integral over the circular arc T to tend 
to zero as R — > oo, and so we may use the result (20.101). Now 


f(s)e sx 


se sx 

(s — k)(s + k ) 


and thus has simple poles at s = k and s = —k. Using (20.70) the residues at each pole can 
be easily calculated as 


m = 


ke kx 

Fk 


and 


R(-k) 


ke~ kx 

2k 


Thus the inverse Laplace transform is given by 

f(x) = i (e kx + e~ kx ) = cosh kx. 

This result may be checked by computing the forward transform of cosh kx. ◄ 


Sometimes a little more care is required when deciding in which half-plane to 
close the contour C. 
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►Find the function f(x) whose Laplace transform is 

f(s)= 1 (e- m -e- bs ), 

s 

where a and b are fixed and positive, with b > a. 


From (20.99) we have the integral 

| r-A+k o Jx-a)s _ Jx-b)s 

f(x) = — / ds. (20.102) 

2711 J , j, r , S 

Now, despite appearances to the contrary, the integrand has no poles, as may be confirmed 
by expanding the exponentials as Taylor series about s = 0. Depending on the value of x, 
several cases arise. 

(i) For x < a both exponentials in the integrand will tend to zero as Re s — > oo. Thus 
we may close L with a circular arc T in the right half-plane (2 can be as small as desired), 
and we observe that s x integrand tends to zero everywhere on T as R — > oo. With no 
poles enclosed and no contribution from T, the integral along L must also be zero. Thus 

/(x) = 0 for x < a. (20.103) 

(ii) For x > b the exponentials in the integrand will tend to zero as Re s -» — oo, and so 
we may close L in the left half-plane, as in figure 20.23(a). Again the integral around T 
vanishes for infinite R and so, by the residue theorem, 

/(.x) = 0 for x > b. (20.104) 


(iii) For a < x < b the two parts of the integrand behave in different ways and have to 
be treated separately: 


h-l 2 = — 

2ni . 


Jx-a)s 


ds 

2ni . 


„(x-b)s 


ds. 


The integrand of f then vanishes in the far left-hand half-plane, but does now have a 
(simple) pole at s = 0. Closing L in the left half-plane, and using the residue theorem, we 
obtain 


/i = residue at s = 0 of s V* “ )s = 1. (20.105) 

The integrand of I 2 , however, vanishes in the far right-hand half-plane (and also has 
a simple pole at s = 0) and is evaluated by a circular-arc completion in that half-plane. 
Such a contour encloses no poles and leads to I 2 = 0. 

Thus, collecting together results (20.103) (20.105) we obtain 

{ 0 for x < a, 

1 for a < x < b , 

0 for x > b, 

as shown in figure 20.24. ◄ 


20.21 Exercises 

20.1 Find an analytic function of z = x + iy whose imaginary part is 

(ycosy + xsiny) exp x. 
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m 


i 


a b x 

Figure 20.24 The result of the Laplace inversion of f(s) = — e~ hs ) 

with b > a. 


20.2 


20.3 


20.4 


20.5 


20.6 


20.7 


Find a function f(z), analytic in a suitable part of the Argand diagram, for which 

Re f = sin2x 

cosh 2 y — cos 2.x ’ 

Where are the singularities of /(z)? 

Find the radii of convergence of the following Taylor series: 

00 ,» 00 „! ,n 

(a)V^, (b)V— , 

Z—/ In v\ ' n n 


(c) ^z"n ln \ 


n + p 


z ", with p real. 


n=l n=l 

Find the Taylor series expansion about the origin of the function f{z) defined by 


oo 

/(*) = £(- D r+ 1 sin(f) 


where p is a constant. Hence verify that /(z) is a convergent series for all z. 
Determine the types of singularities (if any) possessed by the following functions 
at z = 0 and z = oo : 

(a) (z — 2) _1 , (b) (1 + z 3 )/z 2 , (c) sinh(l/z), 

(d) e*/z 3 , (e) z 1/2 /(l + z 2 ) 1/2 . 


Identify the zeroes, poles and essential singularities of the following functions: 

(a) tanz, (b) [(z — 2)/z 2 ] sin[l/(l — z)], (c) exp(l/z), 

(d) tan(l/z), (e) z 2/3 . 


Find the real and imaginary parts of the functions (i) z 2 , (ii) e z , and (iii) coshrez. 
By considering the values taken by these parts on the boundaries of the region 
0 < x, y < 1, determine the solution of Laplace’s equation in that region that 
satisfies the boundary conditions 

4>{x, 0) = 0, 0(O,y) = O, 

cj)(x, 1) = x, 4>{l, y) = y + sin ny. 
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20.8 For the function 


/(z) = In 


z + c 


where c is real, show that the real part u of / is constant on a circle of radius 
ccosech u centred on the point z = c coth u. Use this result to show that the 
electrical capacitance per unit length of two parallel cylinders of radii a, placed 
with their axes 2d apart, is proportional to [cosh _1 (d/a)] _1 . 

20.9 Find a complex potential in the z-plane appropriate to a physical situation in 
which the half-plane x > 0, y = 0 has zero potential and the half-plane x < 0, 
y = 0 has potential V. 

By making the transformation w = a(z + z~ l )/2, with a real and positive, find 
the electrostatic potential associated with the half-plane r > a, s = 0 and the 
half-plane r < —a, s = 0 at potentials 0 and V respectively. 

20.10 By considering in turn the transformations 

z = ^c(w + w -1 ), w = exp£, 

where z = x + iy, w = r exp id, C = C + it] and c is a real positive constant, show 
that z = c cosh f maps the strip £ > 0, 0 < r] < 2n, onto the whole z-plane. Which 
curves in the z-plane correspond to the lines £ = constant and p = constant? 
Identify those corresponding respectively to £ = 0, p = 0 and p = 2n. 

The electric potential <j> of a charged conducting strip —c < x < c, y = 0, 
satisfies 

<j) ~ —k ln(x 2 + y 2 ) 1/2 for large {x 2 + y 2 ) 1/2 , 

with 4> constant on the strip. Show that <j> = Re[— k cosh -1 (z/c)] and that the 
magnitude of the electric field near the strip is k(c 2 — x 2 )~ 1/2 . 

20.11 Show that the transformation 

W = L (C 3 -0 1/2 dC 

transforms the upper half-plane into the interior of a square that has one corner 
at the origin of the w-plane and sides of length L, where 

nn/2 

L= cosec 1/2 9 d9. 

Jo 

20.12 The fundamental theorem of algebra states that a complex polynomial p„(z) of 
degree n has precisely n complex roots. By applying Liouville’s theorem (see the 
end of section 20.12) to f(z) = 1 /p„(z) prove that p„(z) has at least one complex 
root. Factor out that root to obtain p n ~i(z) and, by repeating the process, prove 
the above theorem. 

20.13 Show that, if a is a positive real constant, the function exp(iaz 2 ) is analytic and 
— > 0 as |z| — ► oo for 0 < argz < n/4. By applying Cauchy's theorem to a suitable 
contour prove that 

cos(ax 2 ) dx = 

20.14 For the equation 8z 3 + z + 1 = 0: 

(a) show that all three roots lie between the circles |z| = 3/8 and |z| = 5/8; 

(b) find the approximate location of the real root, and hence deduce that the 
complex ones lie in the first and fourth quadrants and have moduli greater 
than 0.5. 
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20.15 

20.16 


20.17 


20.18 


20.19 


20.20 


20.21 


(a) Prove that z 8 + 3z 3 + 7z + 5 has two zeroes in the first quadrant. 

(b) Find in which quadrants the zeroes of 2z 3 + 7z 2 + 10" + 6 lie. Try to locate 
them. 

The following is a method of determining the number of zeroes of an nth-degree 
polynomial f(z) inside the contour C given by |z| = R: 


(a) put z = R(1 + it)/(l — it) with f = tan(0/2) in — oo < t < oo; 

(b) obtain /(z) as 

A(t) + iB(t) (1 + itf ' 

(1 — if)" ( 1 + it)" ’ 

(c) show that arg/(z) = tan~ l (B/A) + rctan -1 f; 

(d) show that A c [arg/(z)] = Ac[tan _1 (B/X)] + nn; 

(e) using inspection or a sketch graph, determine A c [tan _1 (B/^)] by finding the 
discontinuities in B/A and evaluating tan _1 (B/zl) at t = +oo. 

Use this method, together with the results of the worked example in section 20.15, 
to show that the zeroes of z 4 + z + 1 in the second and third quadrants have 
|z| < 1. 

By considering the real part of 

r -iz n ~ l dz 

J 1 — a(z + z^ 1 ) + a 2 ’ 

where z = exp id and n is a non-negative integer, evaluate 


f n cos n9 
Jo 1—2 a cos d + a 2 


for a real and > 1. 

Prove that if f(z) has a simple pole at z 0 then l//(z) has residue l//'(z 0 ) there. 
Hence evaluate 


r Sind 
J_ n u — sin 0 


where a is real and > 1. 

The equation of an ellipse in plane polar coordinates r, 0, with one of its foci at 
the origin, is 

- = 1 — ecos0, 
r 


where / is a length (that of the latus rectum) and e (0 < e < 1) is the eccentricity 
of the ellipse. Express the area of the ellipse as an integral around the unit circle 
in the complex plane, and show that the only singularity of the integrand inside 
the circle is a double pole at zo = — (e~ 2 — 1) 1/2 . 

By setting z = z 0 + f and expanding the integrand in powers of £, find the 
residue at zo and hence show that the area is equal to 7i/ 2 (l — e 2 )~ 3/2 . (In terms 
of the semi-axes a and b of the ellipse, / = b 2 /a and e 2 = (a 2 — b 2 )/a 2 .) 

Prove that, for a. > 0, the integral 


has the value (7r/2) exp( — a). 
Prove that 



t sin at 
1+t 2 


dt 


f 0 C ° SmX dx = - (4e- m ' 2 - e-) 
Jo 4x 4 + 5x 2 + 1 6 1 > 


for m > 0. 
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20.22 


20.23 


20.24 


20.25 


20.26 

20.27 


20.28 


20.29 


Show that the principal value of the integral 

cos(x/n) 


£ 


dx 


is —(it /a) sin 1. 

(a) Prove that the integral of [exp(77rz 2 )]cosec7iz around the parallelogram with 
corners +1/2 + Rexp(/7i/4) has the value 2 i. 

(b) Show that the parts of the contour parallel to the real axis give no contribu- 
tion when R —> oo. 

(c) Evaluate the integrals along the other two sides by putting z' = rexp(i7r/4) 
and working in terms of z' + \ and z' — '. Hence by letting R —> oo show that 


£ 


dr = 1. 


By applying the residue theorem around a wedge-shaped contour of angle 2n/n, 
with one side along the real axis, prove that the integral 

r dx 

Jo 1 + -X" ’ 

where n is real and > 2, has the value (7i/n)cosec(7i/n). 

Using a suitable cut plane, prove that if a is real and 0 < a < 1 then 

r ° ° 

dx 


1 + x 


has the value n cosec net.. 
Show that 


£ 


lnx 


dx = —y/ln 2 . 


Jo x 3 A(l+x) 

By integrating a suitable function around a large semicircle in the upper half 
plane and a small semicircle centred on the origin, determine the value of 

(In x) 2 


/ = 


r £ 

Jo 1 


+ X 2 


■ dx 


and deduce, as a by-product of your calculation, that 

lnx 


£ 


1 + X 2 


dx = 0. 


Prove that 


E 


n 2 + In + i 


= 4n. 


Carry out the summation numerically, say between —4 and 4, and note how 
much of the sum comes from values near the poles of the contour integration. 

(a) Determine the residues at all the poles of the function 

71 cot 71 z 

f( Z = 1,2 ’ 

a 1 + z 2 

where a is a positive real constant. 

(b) By evaluating, in two different ways, the integral / of f(z) along the straight 
line joining — oo — ia/2 and +oo — ia/2, show that 


E 


i 


a 2 + n 2 


n coth na 
2 a 


1 

2a 2 ' 


(c) Deduce the value of n 
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20.30 


20.31 


20.32 


20.33 


By considering the integral of 


a. < 


n 

2 


around a circle of large radius, prove that 




(ma ) 2 


1 

2 ' 


Use the Bromwich inversion, and contours such as those shown in figure 20.23(a), 
to find the functions of which the following are the Laplace transforms: 

(a) s(s 2 + b 2 y l ; 

(b) n!(s — a)~ (n+1 \ with n a positive integer and s > a; 

(c) a(s 2 — a 2 ) -1 , with s > |a|. (Change variable to t = s — |a|.) 

Compare your answers with those given in table 13.1. 

Find the function f(t) whose Laplace transform is 


/(*) = 


■ 1 +s 


A function / (f) has the Laplace transform 

F(s) = j. In 


s + i 
s — i 


the complex logarithm being defined by a finite branch cut running along the 
imaginary axis from —i to i. 

(a) Convince yourself that, for t > 0, f(t) can be expressed as a closed contour 
integral that encloses only the branch cut. 

(b) Calculate F(s) on either side of the branch cut, evaluate the integral and 
hence determine / (f). 

(c) Confirm that the derivative with respect to s of the Laplace transform 
integral of your answer is the same as that given by dF/ds. 

20.34 Use the contour in figure 20.23(c) to show that the function with Laplace 
transform s~ 1/2 is (7i.\') _1/2 . (For an integrand of the form r~^ 2 exp(— rx) change 
variable to t = r l/2 .) 


20.22 Hints and answers 

20.1 du/dy = — (expx)(y cosy + x siny + siny); z expz. 

20.2 / = (sin2x — t'sinh2y)/(cosh2y — cos2x); the special case of z real shows that 
/(z) = cotz; poles at z = nn. 

20.3 (a) 1 ; (b) 1 ; (c) 1 ; (d) e~ p . 

20.4 The series is given by 

_ (-1 )"+ 1 p 2 «+ 1 ” (-l) r 

~" +1 ( 2/i + 1 ) ! r 2n+1 ’ 

for integer n > 0, ai,, = 0; R _1 = lim [pxlx (2 n + lp 1 ] = 0, and so the series 
is convergent by the root test. 

20.5 (a) analytic, analytic; (b) double pole, single pole; (c) essential singularity, ana- 
lytic; (d) triple pole, essential singularity; (e) branch point, branch point. 
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20.6 (a) Zeroes at z = nn, simple poles at z = nn + n/2, essential singularity at z = oo; 

(b) zeroes at z = oo, 2 and 1 — (nn)- 1 , double pole at z = 0, essential singularity 
at z = 1 ; 

(c) zero at oo, essential singularity at z = 0; 

(d) zeroes at z = oo and (nn)- 1 , simple poles at z = (nn + n/2)- 1 , essential 
singularity at z = 0; 

(e) zero and branch point at the origin, essential singularity at z = oo. 

20.7 (i) x 2 — y 2 ,2xy, (ii) e A 'cosy, e A siny; (iii) cosh nx cos ny, sinh nx sin 7ty ; 

4>(x,y) = xy + (sinh nx sin ny)/ sinh n. 

20.8 Set ccothiq = — d, ccothu 2 = +d, |ccosechw| = a and note that the capacitance 
is proportional to (u 2 — u i) -1 . 

20.9 /(z) = —i(V/n) In z; —i(V /n) In {(z/a) + [(z/a) 2 — l] 1 ^ 2 }. 

20.10 c = constant, ellipses x 2 (a+l) _2 +y 2 (a— l) -2 = c 2 /(4a 2 ); tj = constant, hyperbolae 
x 2 (cosa)~ 2 — y 2 (sina)~ 2 = c 2 . The curves are the cuts — c < x < c, y = 0; and 
|x| > c, y = 0. The curves for = 2n are the same as those for tj = 0. 

20.13 Use a contour bounding the sector 0 < argz < n/4 to establish the relationship 
between the required integral and that with exp(— au 2 ) as the integrand. 

20.14 (a) |z| = 3/8, |8z 3 +z| < 51/64 < 1; |z| = 5/8, |8z 3 | = 125/64 > 104/64 > |z + l|; 
(b) write as 8(z — y)(z — a — ift)(z — a + i[l) = 0, y < 0, and then the zero coefficient 
of z 2 shows a. > 0. Show —3/8 > y > —1/2 and use — 8y(or + / 1 2 ) = 1. 

20.15 (a) For a quarter-circular contour enclosing the first quadrant, the change in the 
argument of the function is 0 + S(n/2) + 0 (since y 8 + 5 = 0 has no real roots); 
(b) one negative real zero; a conjugate pair in the second and third quadrants, 
— \-> — 1 it. 

20.16 A = 3 — 12f 2 + f 4 , B = —2 1 — 2f 3 , Ac [tan -1 (fl /A)] = 0, A c [arg/(z)] = 4n; hence 
there are two zeroes inside |z| = 1. 

20.17 Pole at z = 1/a; ncr"(a 2 — 1) _1 . 

20.18 The only pole inside the unit circle is at z = ia — i(a 2 — 1) 1/2 ; the residue is given 
by —(i/2)(a 2 — 1U 1/2 ; the integral has value 27i[a(a 2 — lp 1/2 — 1], 

20.19 The integrand is 2/ 2 z(2z — ez 2 — e)~ 2 ; residue = (4e) -1 (e -2 — lp 3/2 . 

20.20 Follow the first example in section 20.17 and use Jordan’s lemma, pole at z = i. 

20.21 Factorise the denominator, showing that the relevant simple poles are at i/2 and i. 

20.22 Use Jordan’s lemma and a semicircular contour indented at z = +a. 

20.23 (a) The only pole is at the origin with residue n~ l ; (b) each is 0[exp(— nR 2 — 

nR/^j2)\', (c) the sum of the integrals is 2 i f R R exp(—nr 2 ) dr. 

20.24 The residue at the only pole inside the contour, z = exp(/7t/n) is — exp(/7t/n). 
The values of the integrals along the two radii differ by a factor — exp(2m'/rc). 

20.25 Use a contour like that shown in figure 20.21. 

20.26 See the previous example. 

20.27 Note that p In" p — > 0 as p — > 0 for all n. When z is on the negative real axis, 
(lnz) 2 contains three terms; one of the corresponding integrals is a standard 
form. The residue at z = i is i7i 2 /8; I = n 3 /&. 

20.28 Evaluate 

/ n cot 7TZ 

around a large circle centred on the origin; residue at z = —1/2 is 0; residue at 
z = —1/4 is 471 cot(— 7i/4). 

20.29 (a) ( a 2 + t? 2 ) -1 at z = n (integer); — 71 coth(7ra)/(2a) at z = +ia. 

(b) Complete the contour separately in the upper half-plane (including all the 
poles on the real axis and the one at z = ia), and in the lower half plane 
(including only the pole at z = —ia). Equate the two expressions for /. 

(c) Take the limit as a — > 0, using l'HopitaFs rule to give n 2 /6. 
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20.30 The behaviour of the integrand for large |z| is |z| 2 exp[(2a — jr)|z|]. The residue 
at z = +m, for each integer m, is sin 2 (wa)(— l) m /(wa) 2 . The contour contributes 
nothing. Required summation = [total sum — (m = 0 term)]/2. 

20.31 Poles at (a) +ib; (b) t = s — a = 0, of order n + 1, (c) t = 0 and t = — 2|a|. See 
table 13.1. 

20.32 Note that f(s) has no pole at s = 0. For f < 0 close the Bromwich contour in the 
right half-plane, and for t > I in the left half-plane. For 0 < t < 1 the integrand 
has to be split into separate terms containing e~ s and s — 1 and the completions 
made in the right and left half-planes respectively. The last of these completed 
contours now contains a second-order pole at s = 0. /(f) = 1 — f for 0 < t < 1 
but is 0 otherwise. 

20.33 (a) Note that F(s) has no singularities in Re s < 0 and apply Cauchy’s theorem 

to reshape the Bromwich contour. 

(b) The real parts of F(s) differ by n on either side of the branch cut; 
/(f) = sin t/t. 

(c) Both are —1/(1 + s 2 ). 

20.34 f r and J tend to 0 as R — > oo and p — > 0. Put s = rexp /71 and s = rexp(-in) on 
the two sides of the cut and use J 0 °° exp(— t 2 x) dr = \(n/x) l/1 . There are no poles 
inside the contour. 
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Tensors 


It may seem obvious that the quantitative description of physical processes cannot 
depend on the coordinate system in which they are represented. However, we may 
turn this argument around : since physical results must indeed be independent of 
the choice of coordinate system, what does this imply about the nature of the 
quantities involved in the description of physical processes? The study of these 
implications and of the classification of physical quantities by means of them 
forms the content of the present chapter. 

Although the concepts presented here may be applied, with little modifi- 
cation, to more abstract spaces (most notably the four-dimensional space-time of 
special or general relativity), we shall restrict our attention to our familiar three- 
dimensional Euclidean space. This removes the need to discuss the properties of 
differentiable manifolds and their tangent and dual spaces. The reader who is 
interested in these more technical aspects of tensor calculus in general spaces, 
and in particular their application to general relativity, should consult one of the 
many excellent textbooks on the subject.f 

Before the presentation of the main development of the subject, we begin by 
introducing the summation convention, which will prove very useful in writing 
tensor equations in a more compact form. We then review the effects of a change 
of basis in a vector space; such spaces were discussed in chapter 8. This is 
followed by an investigation of the rotation of Cartesian coordinate systems, and 
finally we broaden our discussion to include more general coordinate systems and 
transformations. 


t For example, D'Inverno, Introducing Einstein’s Relativity (Oxford, 1992); Foster and Nightingale, 
A Short Course in General Relativity (Springer-Verlag, 1994); Schutz, A First Course in General 
Relativity (Cambridge, 1990). 
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21.1 Some notation 


Before proceeding further, we introduce the summation convention for subscripts, 
since its use looms large in the work of this chapter. The convention is that 
any lower-case alphabetic subscript that appears exactly twice in any term of an 
expression is understood to be summed over all the values that a subscript in 
that position can take (unless the contrary is specifically stated). The subscripted 
quantities may appear in the numerator and/or the denominator of a term in an 
expression. This naturally implies that any such pair of repeated subscripts must 
occur only in subscript positions that have the same range of values. Sometimes 
the ranges of values have to be specified but usually they are apparent from the 
context. 

The following simple examples illustrate what is meant (in the three-dimensional 
case): 


(i) a,Xi stands for ai.xi + 01X2 + <23X3 ; 


(ii) aijbjk stands for a a b lk + aabik + a a bu c ; 
(hi) QijbjkCk stands for £)j=i X)L 1 a v b jkCk', 


(iv) 


dvj 3vi 

- — stands lor - — 

OXi OX 1 


dV 2 

3 X 2 


di>3 _ 
dX 3 ’ 


(V) 


d 2 4 > 

dxjdxi 


stands for 


d 2 <j) 

dx 2 


d 2 4 > 

dx 2 


d 2 (t> 

dx 2 


Subscripts that are summed over are called dummy subscripts and the others 
free subscripts. It is worth remarking that when introducing a dummy subscript 
into an expression, care should be taken not to use one that is already present, 
either as a free or as a dummy subscript. For example, a,/h/2-6>/ cannot, and must 
not, be replaced by ay-h/yC;/ or by aub^cu, but could be replaced by a im b mk c k t 
or by a im b mn c„i. Naturally, free subscripts must not be changed at all unless the 
working calls for it. 

Furthermore, as we have done throughout this book, we will make frequent 
use of the Kronecker delta <Sy, which is defined by 


<5 


y — 


if i = j, 
otherwise. 


When the summation convention has been adopted, the main use of <Sy is to 
replace one subscript by another in certain expressions. Examples might include 


and 


bjSjj — b t . 


a ijbjk cijk . 


( 21 . 1 ) 
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In the second of these the dummy index shared by both terms on the left-hand 
side (namely j ) has been replaced by the free index carried by the Kronecker delta 
(namely k), and the delta symbol has disappeared. In matrix language, (21.1) can 
be written as A I = A, where A is the matrix with elements ay and I is the unit 
matrix having the same dimensions as A. 

In some expressions we may use the Kronecker delta to replace indices in a 
number of dilferent ways, e.g. 

Qijbjkdki = ®ijbji Or Qkjbjki 

where the two expressions on the RHS are totally equivalent to one another. 


21.2 Change of basis 

In chapter 8 some attention was given to the subject of changing the basis set (or 
coordinate system) in a vector space and it was shown that, under such a change, 
different types of quantity behave in different ways. These results are given in 
section 8.15, but are summarised below for convenience, using the summation 
convention. Although throughout this section we will remind the reader that we 
are using this convention, it will simply be assumed in the remainder of the 
chapter. 

If we introduce a set of basis vectors ei,e 2 ,e 3 into our familiar three-dimensional 
(vector) space, then we can describe any vector x in terms of its components 
xi,x 2 ,x 3 with respect to this basis: 

x = xiei + x 2 e 2 + x 3 e 3 = x,e„ 

where we have used the summation convention to write the sum in a more 
compact form. If we now introduce a new basis ej , e' 2 , e 3 related to the old one by 

e'- = Sijd (sum over i ), (21.2) 

where the coefficient Sy is the ith component of the vector e'- with respect to the 
unprimed basis, then we may write x with respect to the new basis as 

x = x \ ej + x 2 e 2 + x 3 e 3 = x-e- (sum over i). 

If we denote the matrix with elements Sy by S, then the components x- and x,- 
in the two bases are related by 

x' = (S _1 )yXy (sum over j), 

where, using the summation convention, there is an implicit sum over j from 
j = 1 to j = 3. In the special case where the transformation is a rotation of the 
coordinate axes, the transformation matrix S is orthogonal and we have 

x' = (S T )yx j = SjjXj (sum over j). (21.3) 
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Scalars behave differently under transformations, however, since they remain 
unchanged. For example, the value of the scalar product of two vectors x • y 
(which is just a number) is unaffected by the transformation from the unprimed 
to the primed basis. Different again is the behaviour of linear operators. If a 
linear operator A is represented by some matrix A in a given coordinate system 
then in the new (primed) coordinate system it is represented by a new matrix 
A' = S _1 AS. 

In this chapter we develop a general formulation to describe and classify these 
different types of behaviour under a change of basis (or coordinate transfor- 
mation). In the development, the generic name tensor is introduced, and certain 
scalars, vectors and linear operators are described respectively as tensors of ze- 
roth, first and second order (the order - or rank - corresponds to the number of 
subscripts needed to specify a particular element of the tensor). Tensors of third 
and fourth order will also occupy some of our attention. 


21.3 Cartesian tensors 

We begin our discussion of tensors by considering a particular class of coordinate 
transformation - namely rotations - and we shall confine our attention strictly 
to the rotation of Cartesian coordinate systems. Our object is to study the prop- 
erties of various types of mathematical quantities, and their associated physical 
interpretations, when they are described in terms of Cartesian coordinates and 
the axes of the coordinate system are rigidly rotated from a basis ei,e2,e3 (lying 
along the Ox i, Ox 2 and Ox 3 axes) to a new one ej . ej, ej (lying along the Ox\, 
Ox' 2 and Ox'-, axes). 

Since we shall be more interested in how the components of a vector or linear 
operator are changed by a rotation of the axes than in the relationship between 
the two sets of basis vectors e, and e', let us define the transformation matrix L 
as the inverse of the matrix S in (21.2). Thus, from (21.2), the components of a 
position vector x, in the old and new bases respectively, are related by 

x'i = LjjXj. (21.4) 

Because we are considering only rigid rotations of the coordinate axes, the 
transformation matrix L will be orthogonal, i.e. such that L -1 = L T . Therefore 
the inverse transformation is given by 

Xj — LjjXj. (21.5) 

The orthogonality of L also implies relations among the elements of L that 
express the fact that LL T = L T L = I. In subscript notation they are given by 

LjkLjk — b jj and LkjLkj — b /y . (21.6) 

Furthermore, in terms of the basis vectors of the primed and unprimed Cartesian 
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Figure 21.1 Rotation of Cartesian axes by an angle 9 about the X 3 -axis. The 
three angles marked 9 and the parallels (broken lines) to the primed axes 
show how the first two equations of (21.7) are constructed. 


coordinate systems, the transformation matrix is given by 

Lij = e ; ■ e 7 \ 

We note that the product of two rotations is also a rotation. For example, 
suppose that x\ = L,jXj and x" = Myx'-; then the composite rotation is described 
by 

Xj — MjjXj — MjjLjkXk — (ML ) ( / c X/ c , 

corresponding to the matrix ML. 


►Find the transformation matrix L corresponding to a rotation of the coordinate axes 
through an angle 9 about the e 2 -axis (or x 2 -axis), as shown in figure 21.1. 


Taking x as a position vector - the most obvious choice - we see from the figure that 
the components of x with respect to the new (primed) basis are given in terms of the 
components in the old (unprimed) basis by 

x\ = xi cos 9 + x 2 sin 9, 

x' 2 = — x\ sin 9 + X 2 cos 9, (21.7) 

-Xj = X3. 

The (orthogonal) transformation matrix is thus 

/ cos 9 sinO 0 \ 

L = [ — sin 9 cos 9 0 ] . 

V 0 0 l J 

The inverse equations are 

xi = x'| cos 9 — x 2 sin 9, 

x 2 = x\ sin 9 + x' 2 cos 9, (21.8) 

X3 = X3, 

in line with (21.5). ◄ 
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21.4 First- and zero-order Cartesian tensors 


Using the above example as a guide, we may consider any set of three quantities 
which are directly or indirectly functions of the coordinates x,- and possibly 
involve some constants, and ask how their values are changed by any rotation of 
the Cartesian axes. The specific question to be answered is whether the specific 
forms v'f in the new variables can be obtained from the old ones v,- using (21.4), 

v't = LjjVj. (21.9) 

If so, the Vi are said to form the components of a vector or first-order Cartesian 
tensor v. The first-order tensor v does not change under the rotation of the 
coordinate axes; nevertheless, since the basis set does change, from ei, e 2 , er to 
e' h e' 2 , Vj, the components of v must also change. The changes must be such that 
v given by 

v = = r'e' (21.10) 

is unchanged. By definition, the position coordinates are themselves the compo- 
nents of such a tensor. 

Since the transformation (21.9) is orthogonal, the components of any such 
first-order Cartesian tensor obey a relation that is the inverse of (21.9), 

v i = L ji v' J . (21.11) 

We now consider explicit examples. In order to keep the equations to reasonable 
proportions, the examples will be restricted to the xiX2-plane, i.e. there are 
no components in the X 3 -direction. Three-dimensional cases are no different in 
principle - but much longer to write out. 


► Which of the following pairs ( vi,v 2 ) form the components of a first-order Cartesian tensor 
in two dimensions? : 

(i) (x 2 ,-xi), (ii) (x 2 ,xi), (Hi) (xfxl). 


We shall consider the rotation discussed in the previous example, and to save space we 
denote cos 9 by c and sin 9 by s. 

(i) Here V/ = x 2 and v 2 = —xi, referred to the old axes. In terms of the new coordinates 
they will be v[ = x 2 and v 2 = —x\, i.e. 


v[ = x' 2 = —sx 1 + OC 2 

v 2 = — x'j = —cx 1 — sx 2 . 


( 21 . 12 ) 


Now if we start again and evaluate v[ and v 2 as given by (21.9) we find that 

v[ = L n vi + L\ 2 v 2 = cx 2 + s(—x 1 ) 
v' 2 = L 21 V 1 + L 22 v 2 = —s(x 2 ) + c(— Xl). 


(21.13) 


The expressions for v[ and v' 2 in (21.12) and (21.13) are the same whatever the values 
of 9 (i.e. for all rotations) and thus by definition (21.9) the pair (X 2 , — xi) is a first-order 
Cartesian tensor. 
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(ii) Here vi = x 2 and V 2 = xi. Following the same procedure, 

v[ = x't = — SXl + CX 2 
v' 2 = x\ = CX 1 + SX 2- 

But, by (21.9), for a Cartesian tensor we must have 

v[ = cv i + sv 2 = cx 2 + SXl 

l>2 = ( — s)ui + CV 2 = —SX 2 + cx 1- 

These two sets of expressions do not agree and thus the pair (x 2 ,xi) is not a first-order 
Cartesian tensor. 

(iii) Vi = x\ and v 2 = x\. As in (ii) above, considering the first component alone is 
sufficient to show that this pair is not a first-order tensor. Evaluating v[ directly gives 

v[ = x\ 2 = c 2 x\ + 2csxix 2 + S 2 X 2 , 


whilst (21.9) requires that 


v[ = CV\ + sv 2 = cx\ + sx 2 . 


which is quite different. ◄ 

There are many physical examples of first-order tensors (i.e. vectors) that will be 
familiar to the reader. As a straightforward one, we may take the set of Cartesian 
components of the momentum of a particle of mass m, (mx\,mx 2 ,mx?,). This set 
transforms in all essentials as (xi,X 2 ,xi), since the other operations involved, 
multiplication by a number and differentiation with respect to time, are quite 
unaffected by any orthogonal transformation of the axes. Similarly, acceleration 
and force are represented by the components of first-order tensors. 

Other more complicated vectors involving the position coordinates more than 
once, such as the angular momentum of a particle of mass m, namely J = 
x x p = m(x x x), are also first-order tensors. That this is so is less obvious in 
component form than for the earlier examples, but may be verified by writing 
out the components of J explicitly or by appealing to the quotient law to be 
discussed in section 21.7 and using the Cartesian tensor from section 21.8. 

Having considered the effects of rotations on vector-like sets of quantities we 
may consider quantities that are unchanged by a rotation of axes. In our previous 
nomenclature these have been called scalars but we may also describe them as 
tensors of zero order. They contain only one element (formally, the number of 
subscripts needed to identify a particular element is zero); the most obvious non- 
trivial example associated with a rotation of axes is the square of the distance of 
a point from the origin, r 2 = x\ + x 2 + x\. In the new coordinate system it will 
have the form r' 2 = x\ 2 + x 2 2 + x! 2 , which for any rotation has the same value as 
x\ + x\ + x 
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In fact any scalar product of two first-order tensors (vectors) is a zero-order 
tensor (scalar), as might be expected since it can be written in a coordinate-free 
way as u • v. 


► By considering the components of the vectors u and v with respect to two Cartesian 
coordinate systems ( related by a rotation ), show that the scalar product u • v is invariant 
under rotation. 


In the original (unprimed) system the scalar product is given in terms of components by 
UjVj (summed over i ), and in the rotated (primed) system by 

UjVj = LijiijLikVk = LijLjkUjVk — SjkiijVk = UjVj , 

where we have used the orthogonality relation (21.6). Since the resulting expression in the 
rotated system is the same as that in the original system, the scalar product is indeed 
invariant under rotations. ◄ 

The above result leads directly to the identification of many physically im- 
portant quantities as zero-order tensors. Perhaps the most immediate of these is 
energy, either as potential energy or as an energy density (e.g. F • dr, eE dr, D E, 
B H, /i B), but others, such as the angle between two directed quantities, are 
important. 

As mentioned in the first paragraph of this chapter, in most analyses of physical 
situations it is a scalar quantity (such as energy) that is to be determined. Such 
quantities are invariant under a rotation of axes and so it is possible to work with 
the most convenient set of axes and still have confidence in the results. 

Complementing the way in which a zero-order tensor was obtained from two 
first-order tensors, so a first-order tensor can be obtained from a zero-order 
tensor. We show this by taking a specific example, that of the electric field 
E = — V0; this is derived from a scalar, the electrostatic potential cj) and has 
components 


E t = - 


dxj ' 


(21.14) 


Clearly, E is a first-order tensor, but we may prove this more formally by 
considering the behaviour of its components (21.14) under a rotation of the 
coordinate axes, since the components of the electric field E- are then given 
by 


{ dcjtV _ 8(1)' _ dxj 84> _ 

^ dxi) ~ dx\ " 8x\ 5 x j ' ij j 


(21.15) 


where (21.5) has been used to evaluate 8xj/dx\. Now (21.15) is in the form 
(21.9), thus confirming that the components of the electric field do behave as the 
components of a first-order tensor. 
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►If Vi are the components of a first-order tensor, show that V ■ v = dvi/dx, is a zero-order 
tensor. 


In the rotated coordinate system V • v is given by 


(dvi\' = 8v[ 
V ox, J dx' 


8xj d 
dx'j d x j 


(LjkVk) — LjjL 


dvk 

ik 5 


since the elements L, ( - are not functions of position. Using the orthogonality relation (21.6) 
we then find 

8v[ _ dv k _ x 8v k _ dvj 
Ox', ' J ‘ k dxj dxj dxj 


Hence dv t /dx t is invariant under rotation of the axes and is thus a zero-order tensor; this 
was to be expected since it can be written in a coordinate-free way as V ■ v. ◄ 


21.5 Second- and higher-order Cartesian tensors 


Following on from scalars with no subscripts and vectors with one subscript, 
we turn to sets of quantities that require two subscripts to identify a particular 
element of the set. Let these quantities by denoted by Ty. 

Taking (21.9) as a guide we define a second-order Cartesian tensor as follows: 
the Tjj form the components of such a tensor if, under the same conditions as 
for (21.9), 

T!j = L ik Lj,T kl (21.16) 

and 

Tjj — L ki L,jT kl . (21.17) 


At the same time we may define a Cartesian tensor of general order as follows. 
The set of expressions Tjj... k form the components of a Cartesian tensor if, for all 
rotations of the axes of coordinates given by (21.4) and (21.5), subject to (21.6), 
the expressions using the new coordinates, Ty... t are given by 


and 


Tij - k — LipLjq ■ 

’ Lfcr Tpq... r 

Tij-.-k — LpjL q j ■ 

T T r 

J^rk 1 pq... r 


(21.18) 

(21.19) 


It is apparent that in three dimensions, an iVth-order Cartesian tensor has 3 N 
components. 

Since a second-order tensor has two subscripts, it is natural to display its 
components in matrix form. The notation [Ty] is used, as well as T, to denote 
the matrix having Ty as the element in the z'th row and jth column. f 

We may think of a second-order tensor T as a geometrical entity in a similar 
way to that in which we viewed linear operators (which transform one vector into 


f We can also denote the column matrix containing the elements v , of a vector by [»,-]. 
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another, without reference to any coordinate system) and consider the matrix 
containing its components as a representation of the tensor with respect to a 
particular coordinate system. Moreover, the matrix T = [T, 7 ], containing the 
components of a second-order tensor, behaves in the same way under orthogonal 
transformations T' = LTL t as a linear operator. 

However, not all linear operators are second-order tensors. More specifically, we 
require the two subscripts in a second-order tensor to refer to the same coordinate 
system, so that only linear operators that transform a vector to another vector in 
the same vector space could be second-order tensors. Thus, although the elements 
Ljj of the transformation matrix are written with two subscripts, they cannot 
be the components of a tensor since the two subscripts each refer to a different 
coordinate system. 

As examples of sets of quantities that are readily shown to be second-order 
tensors we consider the following. 

(i) The outer product of two vectors. Let m, and v b i = 1,2,3, be the components 
of two vectors u and v, and consider the set of quantities Ty defined by 

Tjj = UiVj. ( 21 . 20 ) 

The set Tjj are called the components of the the outer product of u and v. Under 
rotations the components T (/ - become 

T^ = UjVj — LjkUkLjiVi = Lj^LjiUkVi = L;/, Tjj T^i , (21.21) 

which shows that they do transform as the components of a second-order tensor. 
Use has been made in (21.21) of the fact that and Vj are the components of 
first-order tensors. 

The outer product of two vectors is often denoted, without reference to any 
coordinate system, as 

T = u ® v. (21.22) 

(This is not to be confused with the vector product of two vectors, which is itself 
a vector and is discussed in chapter 7.) The expression (21.22) gives the basis to 
which the components T t j of the second-order tensor refer. Since u = w,e, and 
v = VjCi, we may write the tensor T as 

T = M,e,- ® Vjtj = UjVjCi ® e 7 = Tye,- ® e 7 . (21.23) 

Moreover, as for the case of first-order tensors (see equation(21.10)) we note 
that the quantities 77. are the components of the same tensor T, but referred to 
a different coordinate system, i.e. 

T = Tjjtj ® e, = T'je'j ® e'j. 

These concepts can be extended to higher-order tensors. 

(ii) The gradient of a vector. Suppose Vj represents the components of a vector; 
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let us consider the quantities generated by forming the derivatives of each Vj, 
r = 1,2,3, with respect to each xj, j — 1,2,3, i.e. 

dvi 


Tij = 


dxi 


These nine quantities form the components of a second-order tensor, as can be 
seen from the fact that 


T ! = M _ 

iJ 8x 'j 


d(L ik v k ) dxi 


dv k 


dxi 


d/r Lik ^ Li,=LikLj,Tkh 


In coordinate-free language the tensor T may be written as T = Vv and hence 
gives meaning to the concept of the gradient of a vector, a quantity that was not 
discussed in the chapter on vector calculus (chapter 10). 

A test of whether any given set of quantities forms the components of a second- 
order tensor can always be made by direct substitution of the x[ in terms of the 
Xu followed by comparison with the right-hand side of (21.16). This procedure is 
extremely laborious, however, and it is almost always better to try to recognise 
the set as being expressible in one of the forms just considered, or to make 
alternative tests based on the quotient law of section 21.7 below. 


► S/iow that the 7); given by 

T -™ = (_i “*f) < 21 - 241 

are the components of a second-order tensor. 

Again we consider a rotation 6 about the e 3 -axis. Carrying out the direct evaluation first 
we obtain, using (21.7), 

7 j'i = x ' 2 2 = s 2 xj — 2scxix 2 + c 2 x\, 

T[ 2 = —x\x 2 = SC.XJ + (s 2 — C 2 )X{X 2 — sex 2 , 

T 2l = — x\x 2 = sex 2 + (s 2 — C 2 )xiX 2 — scx 2 , 

T 22 = x ' 2 = C 2 xf + 2 sC.Xi.X 2 + s 2 x\. 

Now, evaluating the right-hand side of (21.16), 

T' n = CCX 2 + CS(— X 1 X 2 ) + SC(— X 1 X 2 ) + ssxj, 

T' n = c(— s)x| + cc(— X1X2) + s(— s)(— X1X2) + sex 2 , 

T 2l = (— s)cx? + (— s)s(— X1X2) + cc(— X1X2) + CSXJ, 

T 22 = (— s)(— s).X2 + (— s)c(— X1X2) + c(— s)(— X1X2) + CCX 2 . 

After reorganisation, the corresponding expressions are seen to be the same, showing, as 
required, that the T t j are the components of a second-order tensor. 

The same result could be inferred much more easily, however, by noting that the T t j 
are in fact the components of the outer product of the vector (x 2 ,— xq) with itself. That 
(.X 2 , — xi) is indeed a vector was established by (21.12) and (21.13). ◄ 

Physical examples involving second-order tensors will be discussed in the later 
sections of this chapter, but we might note here that, for example, the magnetic 


786 




21.6 THE ALGEBRA OF TENSORS 


susceptibility and electrical conductivity of materials are described by second- 
order tensors. 


21.6 The algebra of tensors 

Because of the similarity of first- and second-order tensors to column vectors and 
matrices, it would be expected that similar types of algebraic operation can be 
carried out with them and so provide ways of constructing new tensors from old 
ones. In the remainder of this chapter, instead of referring to the Ty (say) as the 
components of a second-order tensor T, we may sometimes simply refer to Ty 
as the tensor. It should always be remembered, however, that the Ty are in fact 
just the components of T in a given coordinate system and that Ty refers to the 
components of the same tensor T in a different coordinate system. 

The addition and subtraction of tensors follows an obvious definition; namely 
that if Vij...k and Wtj...k are (the components of) tensors of the same order, then 
their sum and difference, Sy...* and Djj... k respectively, are given by 

Sij-.-k = Vij-k + Wij...k, 

Dij- --k — Vij---k IT//' 

for each set of values That Sy . . , k and Dy.y are the components of 

tensors follows immediately from the linearity of a rotation of coordinates. 

It is equally straightforward to show that if the T’y.../ C are the components of 
a tensor, then so is the set of quantities formed by interchanging the order of (a 
pair of) indices, e.g. T),-..^. 

If Tji... k is found to be identical with Ty.../t then Ty.../ C is said to be symmetric 
with respect to its first two subscripts (or simply ‘symmetric’, for second-order 
tensors). If, however, T ;i ..y = — Ty.../ C for every element then it is an antisymmetric 
tensor. An arbitrary tensor is neither symmetric nor antisymmetric but can always 
be written as the sum of a symmetric tensor Sy.../c and an antisymmetric tensor 
Aij... k : 

Tij...k — \(Tjj...k + Tjj... k ) + i( Tjj... k — Tji ... k ) 

= Sij.„k +Aij... k . 

Of course these properties are valid for any pair of subscripts. 

In (21.20) in the previous section we had an example of a kind of ‘multiplication’ 
of two tensors, thereby producing a tensor of higher order - in that case two 
first-order tensors were multiplied to give a second-order tensor. Inspection of 
(21.21) shows that there is nothing particular about the orders of the tensors 
involved and it follows as a general result that the outer product of an iVth-order 
tensor with an Mth-order tensor will produce an (M + iV)th-order tensor. 

An operation that produces the opposite effect - namely, generates a tensor 
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of smaller rather than larger order - is known as contraction and consists of 
making two of the subscripts equal and summing over all values of the equalised 
subscripts. 


► S/iow that the process of contraction of a tensor produces another tensor, but with an 
order reduced by 2. 


Let be the components of an JVth-order tensor, then 

h ipl^jq ' ' " h/r ' ’ ’ L ms ' ' * L kn Tpq... r ... s ... n , 

N factors 


Thus if, for example, we make the two subscripts / and m equal and sum over all values 
of these subscripts, we obtain 


T', 


— Lin L • • • /. I ,■ ' ' ■ Li. ' ' ' Li-,, T nt 


LipLjq ' ' ' Lkn ■ 


(N — 2) factors 


showing that are the components of a (different) Cartesian tensor of order 

IV - 2. ◄ 


For a second-rank tensor, the process of contraction is the same as taking the 
trace of the corresponding matrix. The trace T„ itself is thus a zero-order tensor 
(or scalar) and hence invariant under rotations, as was noted in chapter 8. 

The process of taking the scalar product of two vectors can be recast into tensor 
language as forming the outer product T i; - = UjVj of two first-order tensors u and 
v and then contracting the second-order tensor T so formed, to give T„- = u,v,. a 
scalar (invariant under a rotation of axes). 

As yet another example of a familiar operation that is a particular case of a 
contraction, we may note that the multiplication of a column vector [«,] by a 
matrix [By] to produce another column vector [i:,], 

Bjj Uj = Vi , 

can be looked upon as the contraction Tyy of the third-order tensor Ty^ formed 
from the outer product of By and iy. 


21.7 The quotient law 

The previous paragraph appears to give a heavy-handed way of describing a 
familiar operation, but it leads us to ask whether it has a converse. To put the 
question in more general terms: if we know that B and C are tensors and also 
that 




n B 


ij-k- 


= Ci 


pq-'-mij-’-ni 


(21.25) 
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does this imply that the A pq ...k... m also form the components of a tensor A? Here 
A, B and C are respectively of Mth, IVth and (M + N— 2)th order and it should be 
noted that the subscript k that has been contracted may be any of the subscripts 
in A and B independently. 

The quotient law for tensors states that if (21.25) holds in all rotated coordinate 
frames then the A pq ... k ... m do indeed form the components of a tensor A. To prove 
it for general M and N is no more difficult regarding the ideas involved than to 
show it for specific M and N, but this does involve the introduction of a large 
number of subscript symbols. We will therefore take the case M = N = 2, but 
it will be readily apparent that the principle of the proof holds for general M 
and N. 

We thus start with (say) 


ApkBu t = C pi , (21.26) 

where B ik and C P j are arbitrary second-order tensors. Under a rotation of coor- 
dinates the set A p k (tensor or not) transforms into a new set of quantities that 
we will denote by A' pk . We thus obtain in succession the following steps, using 
(21.16), (21.17) and (21.6): 

A ' pk B'ik = c pi (transforming (21.26)), 

= L pq LijC q j (since C is a tensor), 

= L pq Lj jA ql Bji (from (21.26)), 

= L pq Li jA q , L m jL n i B r mn (since B is a tensor), 

= Lpq L„i A q[ B m (since LjjL m j = <5/ m ). 

Now k on the left and n on the right are dummy subscripts and thus we may 
write 


(A' pk - L pq L kl A q i)B' ik = 0. (21.27) 

Since B ik , and hence B' k , is an arbitrary tensor, we must have 

d pk LpqL k iA q i, 

showing that the A' pk are given by the general formula (21.18) and hence that 
the A p k are the components of a second-order tensor. By following an analogous 
argument, the same result (21.27) and deduction could be obtained if (21.26) were 
replaced by 

d pk Bj, : i = Up,', 

i.e. the contraction being now with respect to a different pair of indices. 

Use of the quotient law to test whether a given set of quantities is a tensor is 
generally much more convenient than making a direct substitution. A particular 
way in which it is applied is by contracting the given set of quantities, having 
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N subscripts, with an arbitrary iVth-order tensor (i.e. one having independently 
variable components) and determining whether the result is a scalar. 


► Use the quotient law to show that the elements ofT, equation ( 21.24 ). are the components 
of a second-order tensor. 

The outer product x,xj is a second-order tensor. Contracting this with the Ty given in 
(21.24) we obtain 

TjjXjXj = x%xj — X1X2X1X2 — X1X2X2X1 + Xjxf = 0 , 

which is clearly invariant (a zeroth-order tensor). Hence by the quotient theorem 7y must 
also be a tensor. ◄ 


21.8 The tensors by and ey& 


In many places throughout this book we have encountered and used the two- 
subscript quantity <5 y defined by 


5 


ij ~ 


if i = j, 

otherwise. 


Let us now also introduce the three-subscript Levi-Civita symbol e^, the value 
of which is given by 

! +l if i,j,k is an even permutation of 1,2,3, 

—1 if i,j,k is an odd permutation of 1,2,3, 

0 otherwise. 

We will now show that by and ey* are respectively the components of a second- 
and a third-order Cartesian tensor. Notice that the coordinates x, do not appear 
explicitly in the components of these tensors, their components consisting entirely 
of 0 and 1. 

In passing, we also note that fy/ { is totally antisymmetric, i.e. it changes sign 
under the interchange of any pair of subscripts. In fact ey*, or any scalar multiple 
of it, is the only three-subscript quantity with this property. 

Treating by first, the proof that it is a second-order tensor is straightforward 
since, if, from (21.16), we consider the equation 


( fl — LkjLljSij — Lfa'L/; — b/y , 


we see that the transformation generates the same expression (a pattern of 
0’s and l’s) as does the definition of 3'j in the transformed coordinates. Thus 
Sjj transforms according to the appropriate tensor transformation law and is 
therefore a second-order tensor. 

Turning now to ey *, we have to consider the quantity 

^Unn ~ LuLfnjl-'nk^ijk. 
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Let us begin, however, by noting that we may use the Levi-Civita symbol to 
write an expression for the determinant of a 3 x 3 matrix A, 

= AliAmjA n k€ijk, (21.28) 

which may be shown to be equivalent to the Laplace expansion (see chapter 8).| 
Indeed many of the properties of determinants discussed in chapter 8 can be 
proved very efficiently using this expression (see exercise 21.9). 


► Evaluate the determinant of the matrix 

2 1 —3 \ 

3 4 0 

1-2 1 J 


Setting 1 = 1, m = 2 and n = 3 in (21.28) we find 
|A| = f'ijk .4 :;.L'/.4 't 

(2)(4)( 1) - (2)(0)(— 2) - (1)(3)(1) + (— 3)(3)(— 2) 

+ (1)(0)(1) - (— 3)(4)(1) = 35, 

which may be verified using the Laplace expansion method. ◄ 

We can now show that the are in fact the components of a third-order tensor. 
Using (21.28) with the general matrix A replaced by the specific transformation 
matrix L, we can rewrite the RHS of (21.8)in terms of |L| 

€ Imn — LljL m jL n k€jjk = L O [ nm - 

Since L is orthogonal its determinant has the value unity, and so e' lmn — e/ m „. 
Thus we see that e' lmn has exactly the properties of e,y; c but with /', j, k replaced by 
/, m, n, i.e. it is the same as the expression e^k written using the new coordinates. 
This shows that <?,/;< is a third-order Cartesian tensor. 

In addition to providing a convenient notation for the determinant of a matrix, 
djj and can be used to write many of the familiar expressions of vector 
algebra and calculus as contracted tensors. For example, provided we are using 
right-handed Cartesian coordinates, the vector product a = b x c has as its 
7th component a,- = e^bjCkl this should be contrasted with the outer product 
T = b ® c, which is a second-order tensor having the components 7y = bjCj. 



1 This may be readily extended to an IV x N matrix A, i.e. 

A Li] ( 2 ' • Tv = T|/i T2/2 ' ’ ’ In 6 j\h"'jN' 

where ei 1 ; 2 ...j equals 1 if i\ii ■■ -in is an even permutation of l,2,...,iV and equals —1 if it is an 
odd permutation; otherwise it equals zero. 
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► Write the following as contracted Cartesian tensors: ab; V 2 f; V x v; V(V- v); V x (V x v); 
(a x b) • c. 


The corresponding (contracted) tensor expressions are readily seen to be as follows: 
a ■ b = n,b, = djjOibj, 


V 2 <h = 


d 2 4> 

dxfxi 

. dv k 
(V x v)i = enk-^r. 


= <5 


8 2 <t> 

dxidxj ’ 


[V(V • v)], = 


dvj 
8xi \8xj 


[V X (V X v)]j — € j jfc | 6/c/ m 1 — ^ijk^kli 


8xj 


= dik 


8v„ 


8 2 Vj 
8x t 8xk ’ 


8xi 


8 2 V,n 
’ 8xj8xi ’ 


(a x b) ■ c = SijCiejkicikb, = e ik iCia k b,. ◄ 


An important relationship between the e- and S- tensors is expressed by the 
identity 

^ijk^klm = 8 H 8jm 8 (m 8 j[. (21.29) 

To establish the validity of this identity between two fourth-order tensors (the 
LHS is a once-contracted sixth-order tensor) we consider the various possible 
cases. 

The RHS of (21.29) has the values 

+1 if i = / and j = m f i, (21.30) 

—1 if i = m and j = If i, (21.31) 

0 for any other set of subscript values i,j,l,m. (21.32) 

In each product on the LHS k has the same value in both factors and for a 
non-zero contribution none of i, l, j , m can have the same value as k. Since there 
are only three values, 1, 2 and 3, that any of the subscripts may take, the only 
non-zero possibilities are i = l and j = m or vice versa but not all four subscripts 
equal (since then each e factor is zero, as it would be if i = j or / = m). This 
reproduces (21.32) for the LHS of (21.29) and also the conditions (21.30) and 
(21.31). The values in (21.30) and (21.31) are also reproduced in the LHS of 
(21.29) since 

(i) if i — 1 and j = m, = e;,,* = and, whether is +1 or —1, the 
product of the two factors is +1; and 

(ii) if i = m and j = /, e-^ = = —Ckim and thus the product e^eu,,, (no 

summation) has the value —1. 

This concludes the establishment of identity (21.29). 
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A useful application of (21.29) is in obtaining alternative expressions for vector 
quantities that arise from the vector product of a vector product. 


► Obtain an alternative expression for V x (V x v). 

As shown in the previous example, V x (V x v) can be expressed in tensor form as 

d^v 

[V x (V x v)]i = e ijk e Um - — 

OXjCXi 

= (S ilSjm-Simdj,)^^- 

oxjdxi 

8 ( di)j\ 8 2 Vi 

dxj \8xj J dxjdxj 

= [V(V • v)]; — V 2 y,-, 

where in the second line we have used the identity (21.29). This result has already 
been mentioned in chapter 10 and the reader is referred there for a discussion of its 
applicability. ◄ 

By examining the various possibilities, it is straightforward to verify that, more 
generally, 


<5 i P 

5iq 

8 ir 

3 Jp 


8jr 

<>kp 

d k q 

8kr 


(21.33) 


and it is easily seen that (21.29) is a special case of this result. From (21.33) we 
can derive alternative forms of (21.29), for example. 


^ijk^ilm t)jm^kl • (21.34) 

The pattern of subscripts in these identities is most easily remembered by noting 
that the subscripts on the first <5 on the RHS are those that immediately follow 
(cyclically, if necessary) the common subscript, here i, in each e-term on the LHS; 
the remaining combinations of j,k,l,m as subscripts in the other <5 -terms on the 
RHS can then be filled in automatically. 

Contracting (21.34) by setting j = l (say) we obtain, since 3 k k = 3 when using 
the summation convention, 


^ijk^ijm 3 Skin &km 2 8km-> 


and by contracting once more, setting k = m, we further find that 


€ ijk e ijk — 6 . 


(21.35) 


21.9 Isotropic tensors 

It will have been noticed that, unlike most of the tensors discussed (except for 
scalars), <5 i; - and have the property that all their components have values 
that are the same whatever rotation of axes is made, i.e. the component values 
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are independent of the transformation Ly. Specifically, dn has the value 1 in 
all coordinate frames, whereas for a general second-order tensor T all we know 
is that if Tn = /n(xi,X2,X3) then T'u = /i i lx', , x 2 , x 3 ). Tensors with the former 
property are called isotropic (or invariant) tensors. 

It is important to know the most general form that an isotropic tensor can take, 
since the description of the physical properties, e.g. the conductivity, magnetic 
susceptibility or tensile strength, of an isotropic medium (i.e. a medium having 
the same properties whichever way it is orientated) involves an isotropic tensor. 
In the previous section it was shown that dy and ey* are second- and third-order 
isotropic tensors; we will now show that, to within a scalar multiple, they are the 
only such isotropic tensors. 

Let us begin with isotropic second-order tensors. Suppose Ty is an isotropic 
tensor; then, by definition, for any rotation of the axes we must have that 

Tij = T(j = L ik Lj,T kl (21.36) 

for each of the nine components. 

First consider a rotation of the axes by 2 n /3 about the (1, 1, 1) direction; this 
takes Ox\, Ox 2 , Ox 3 into 0x' 2 , 0x 3 , Ox\ respectively. For this rotation L 13 = 1, 
L21 = 1, L32 = 1 and all other Ly = 0 . This requires that T u — T’ n = T 33 . 
Similarly T12 = *12 = * 31 . Continuing in this way, we find : 

(a) Tu = T22 = T33; 

(b) T \2 = T 2 3 = T31 ; 

(c) T21 = T32 = T13. 

Next, consider a rotation of the axes (from their original position) by n /2 
about the Ox3-axis. In this case Lu = — 1, L21 = 1, L33 = 1 and all other Ly = 0 . 
Amongst other relationships, we must have from (21.36) that: 

*13 = ( — 1) x 1 x T23; 

*23 — 1 x 1 x T13. 

Hence T13 = T23 = 0 and therefore, by parts (b) and (c) above, each element Ty = 

0 except for Tu, T22 and T33, which are all the same. This shows that Ty = 7 .dy. 

► S/iow that hetjk is the only isotropic third-order Cartesian tensor. 

The general line of attack is as above and so only a minimum of explanation will be given. 
Tjjk = T'j k = Lu L jm L kn Ti mn (in all, there are 27 elements). 

Rotate about the ( 1 , 1 , 1 ) direction: this is equivalent to making subscript permutations 

1 -► 2 -► 3 -* 1 . We find 

(a) fin = T222 = T333, 

(b) Tin = T223 = T331 (and two similar sets), 

(c) T12 3 = T231 = T312 (and a set involving odd permutations of 1 , 2 , 3 ). 
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Rotate by n/2 about the Ox 3 -axis: L J2 = — 1, T 21 = 1, L33 = 1, the other L i; - = 0. 

(d) Em = (—1) x (—1) x (—1) x T222 = —T222, 

(e) T m = (-1) x (-1) x 1 x T 22U 

(f) T 221 = 1 x 1 x (-1) x T m , 

(g) Tl23 = (—1) X 1 X 1 X T 213 . 


Relations (a) and (d) show that elements with all subscripts the same are zero. Relations 
(e), (f) and (b) show that all elements with repeated subscripts are zero. Relations (g) and 
(c) show that Ti 2 3 = T 2 31 = Tm = — T213 = — T321 = — Tm- 

In total, Tjj k differs from e ijk by at most a scalar factor, but since e,j k (and hence Xe^) 
has already been shown to be an isotropic tensor, must be the most general third-order 
isotropic Cartesian tensor. ◄ 

Using exactly the same procedures as those employed for <5y and e^, it may be 
shown that the only isotropic first-order tensor is the trivial one with all elements 
zero. 


21.10 Improper rotations and pseudotensors 

So far we have considered rigid rotations of the coordinate axes described by 
an orthogonal matrix L with |L| = +1, (21.4). Strictly speaking such transfor- 
mations are called proper rotations. We now broaden our discussion to include 
transformations that are still described by an orthogonal matrix L but for which 
|L| = — 1; these are called improper rotations. 

This kind of transformation can always be considered as an inversion of the 
coordinate axes through the origin represented by the equation 

x'j = —Xi, (21.37) 

combined with a proper rotation. The transformation may be looked upon 
alternatively as one that changes an initially right-handed coordinate system into 
a left-handed one; any prior or subsequent proper rotation will not change this 
state of affairs. The most obvious example of a transformation with |L| = —1 is 
the matrix corresponding to (21.37) itself; in this case Ly = — <5y. 

As we have emphasised in earlier chapters, any real physical vector v may be 
considered as a geometrical object (i.e. an arrow in space), which can be referred 
to independently of any coordinate system and whose direction and magnitude 
cannot be altered merely by describing it in terms of a different coordinate system. 
Thus the components of v transform as v- = LyVj under all rotations (proper and 
improper). 

We can define another type of object, however, whose components may also 
be labelled by a single subscript but which transforms as v\ = L t jVj under proper 
rotations and as v\ = — LyVj (note the minus sign) under improper rotations. In 
this case, the v t are not strictly the components of a true first-order Cartesian 
tensor but instead are said to form the components of a first-order Cartesian 
pseudotensor or pseudovector. 
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Figure 21.2 The behaviour of a vector v and a pseudovector p under a 
reflection through the origin of the coordinate system xi,X 2 ,xj giving the new 
system x\,x' 2 ,x' v 


It is important to realise that a pseudovector (as its name suggests) is not a 
geometrical object in the usual sense. In particular, it should not be considered 
as a real physical arrow in space, since its direction is reversed by an improper 
transformation of the coordinate axes (such as an inversion through the origin). 
This is illustrated in figure 21.2, in which the pseudovector p is shown as a broken 
line to indicate that it is not a real physical vector. 

Corresponding to vectors and pseudovectors, zeroth-order objects may be 
divided into scalars and pseudoscalars - the latter being invariant under rotation 
but changing sign on reflection. 

We may also extend the notion of scalars and pseudoscalars, vectors and pseu- 
dovectors, to objects with two or more subscripts. For two subcripts, as defined 
previously, any quantity with components that transform as T'j — L^LjiT^i un- 
der all rotations (proper and improper) is called a second-order Cartesian tensor. 
If, however, T'j = L^LjiT^ under proper rotations but T'- = —L^LjiT^ under 
improper ones (which include reflections), then the Ty are the components of 
a second-order Cartesian pseudotensor. In general the components of Cartesian 
pseudotensors of arbitary order transform as 

l [j... i = |L| LtiLjm ■ ■ ■ Lk„Ti m ... n , (21.38) 

where |L| is the determinant of the transformation matrix. 

For example, from (21.28) we have that 

m^ijk = LjiLj m Lkn£lmtn 
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but since |L| = +1 we may rewrite this as 

€ ijk = |L| Ljl LjmLknClmn- 

From this expression, we see that although behaves as a tensor under proper 
rotations, as discussed in section 21.8, it should properly be regarded as a third- 
order Cartesian pseudotensor. 


>-If bj and are the components of vectors, show that the quantities a t = e^bjCk form 
the components of a pseudovector. 

In a new coordinate system we have 

a 'i = e'ijkb'/k 

I LjmLknClmnL'jpbpLkqCq 
I \—\Ln(:lmn^mp&nqbpCq 
= \^-\Lu€i mn b m^n 

from which we see immediately that the quantities a t form the components of a pseu- 
dovector. ◄ 

The above example is worth some further comment. If we denote the vec- 
tors with components bj and cp by b and c respectively then, as mentioned in 
section 21.8, the quantities a t = e^bjCk are the components of the real vector 
a = b x c, provided that we are using a right-handed Cartesian coordinate system. 
Flowever, in a coordinate system that is left-handed the quantitites a\ = eh k b'jC' k 
are not the components of the physical vector a = b x c, which has, instead, the 
components —a'. It is therefore important to note the handedness of a coordinate 
system before attempting to write in component form the vector relation a = b x c 
(which is true without reference to any coordinate system). 

It is worth noting that, although pseudotensors can be useful mathematical 
objects, the description of the real physical world must usually be in terms of 
tensors (i.e. scalars, vectors, etc.).f For example, the temperature or density of a 
gas must be a scalar quantity (rather than a pseudoscalar), since its value does 
not change when the coordinate system used to describe it is inverted through 
the origin. Similarly, velocity, magnetic field strength or angular momentum can 
only be described by a vector, and not by a pseudovector. 

At this point, it may be useful to make a brief comment on the distinction 
between active and passive transformations of a physical system, as this difference 
often causes confusion. In this chapter, we are concerned solely with passive trans- 


f In fact the quantum-mechanical description of elementary particles, such as electrons, protons and 
neutrons, requires the introduction of a new kind of mathematical object called a spinor, which is 
not a scalar, vector, or more general tensor. The study of spinors, however, falls beyond the scope 
of this book. 
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formations, for which the physical system of interest is left unaltered, and only 
the coordinate system used to describe it is changed. In an active transformation, 
however, the system itself is altered. 

As an example, let us consider a particle of mass m that is located at a position 
x relative to the origin 0 and hence has velocity x. The angular momentum of 
the particle about 0 is thus J = m(x x x). If we merely invert the Cartesian 
coordinates used to describe this system through 0 , neither the magnitude nor 
direction of any these vectors will be changed, since they may be considered 
simply as arrows in space that are independent of the coordinates used to de- 
scribe them. If, however, we perform the analogous active transformation on 
the system, by inverting the position vector of the particle through 0 , then it 
is clear that the direction of particle’s velocity will also be reversed, since it 
is simply the time derivative of the position vector, but that the direction of 
its angular momentum vector remains unaltered. This suggests that vectors can 
be divided into two categories, as follows: polar vectors (such as position and 
velocity) which reverse direction under an active inversion of the physical sys- 
tem through the origin and axial vectors (such as angular momentum), which 
remain unchanged. It should be emphasised that at no point in this discus- 
sion have we used the concept of a pseudovector to describe a real physical 
quantity.! 


21.11 Dual tensors 

Although pseudotensors are not themselves appropriate for the description of 
physical phenomena, they are sometimes needed; for example, we may use the 
pseudotensor e-y* to associate with every antisymmetric second-order tensor Ay 
(in three dimensions) a pseudovector p t given by 

Pi = \e ijk A jk ; (21.39) 

Pi is called the dual of Ay. Thus if we denote the antisymmetric tensor A by the 
matrix 

/ 0 A\2 —A?, i \ 

A = [Ay] = I — A 12 0 A 2 3 I 

\ A31 — A23 0 / 

then the components of its dual pseudovector are (pi,pi,P 3 ) = (A 23 , A 31 , An). 


f The scalar product of a polar vector and an axial vector is a pseudoscalar. It was the experimental 
detection of the dependence of the angular distribution of electrons of (polar vector) momentum 
p e emitted by polarised nuclei of (axial vector) spin Jn upon the pseudoscalar quantity Jn • Pe that 
established the existence of the non-conservation of parity in /?- decay. 
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By contracting both sides of (21.39) with e t j k , we find 

L/7 . Pk -> L/7 ^klm A Itn ■ 

Using the identity (21.29) then gives 

€ ijkPk = 3f m Sj/)A/ m 

= \(A ij — Aji) = \(A tj + Aij) = A,j, 
where in the last line we use the fact that A,j = — A ◄ 

By a simple extension, we may associate a dual pseudoscalar s with every 
totally antisymmetric third-rank tensor Ajj k , i.e. one that is antisymmetric with 
respect to the interchange of every possible pair of subscripts; s is given by 

S = h e ' lkA ' ik ' (21.40) 

Since A ^ is a totally antisymmetric three-subscript quantity, we expect it to 
equal some multiple of (since this is the only such quantity). In fact = se^, 
as can be proved by substituting this expression into (21.40) and using (21.35). 

21.12 Physical applications of tensors 

In this section some physical applications of tensors will be given. First-order 
tensors are familiar as vectors and so we will concentrate on second-order tensors, 
starting with an example taken from mechanics. 

Consider a collection of rigidly connected point particles of which the ath, 
which has mass n/A and is positioned at r (a) with respect to an origin 0, is 
typical. Suppose that the rigid assembly is rotating about an axis through 0 with 
angular velocity to. 

The angular momentum J about 0 of the assembly is given by 

j = ( r(a) x p (a) ) ■ 

a 

But p ,o;) = and r (c<) = m x r (a) , for any a, and so in subscript form the 

components of J are given by 

J; = ^2 m^etjkxfh^ 

a 

= ^2 m (a) e i j k xf ) eki m (Oix% ) 

a 

= 5> (a ’( 5 «S J n,-S im 8 J ,)x ( ; ) xWcoi 

a 

= m (a) (r (a) ) 2 <5,7 -x\ a) x\ a) coi = Iucoi, (21.41) 

a 

where /,■; is a symmetric second-order Cartesian tensor (by the quotient rule, see 
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section 21.7, since J and a) are vectors). The tensor is called the inertia tensor at O 
of the assembly and depends only on the distribution of masses in the assembly 
and not upon the direction or magnitude of m. 

A more realistic situation obtains if a continuous rigid body is considered. In 
this case, m- a) must be replaced everywhere by p(r)dxdydz and all summations 
by integrations over the volume of the body. Written out in full in Cartesians, 
the inertia tensor for a continuous body would have the form 

/ J(y 2 + z 2 )pdV —JxypdV —JxzpdV \ 

I = [/,/] = | —JxypdV J(z 2 +x 2 )pdV —JyzpdV J, 

\ —JxzpdV —JyzpdV J(x 2 + y 2 )pdV J 

where p = p(x,y,z) is the mass distribution and dV stands for dxdydz ; the 
integrals are to be taken over the whole body. The diagonal elements of this 
tensor are called the moments of inertia and the off-diagonal elements without the 
minus signs are known as the products of inertia. 

► S/iow that the kinetic energy of the rotating system is given by T = \lji(Oj<x>i. 

By an argument parallel to that already made for J, the kinetic energy is given by 
T = \ YJ, (r ,a) ■ r (ol) ) 

a 

a 

= 5 ^2 m (a) (5ji5 km - S jm 5 k ,)x { fx { fa>jCOi 

a 

= (r^-xfx^wjco, 

a 

= jljlWjO)!. 

Alternatively, since Jj = ficoi we may write the kinetic energy of the rotating system as 
T = 2 Jj 0) j- ^ 

The above example shows that the kinetic energy of the rotating body can be 
expressed as a scalar obtained by twice contracting m with the inertia tensor. It 
also shows that the moment of inertia of the body about a line given by the unit 
vector n is Ijihjhi (or n T ln in matrix form). 

Since I (= Iji) is a real symmetric second-order tensor, it has associated with it 
three mutually perpendicular directions that are its principal axes and have the 
following properties (proved in chapter 8): 

(i) with each axis is associated a principal moment of inertia p= 1,2,3; 

(ii) when the rotation of the body is about one of these axes, the angular 
velocity and the angular momentum are parallel and given by 

J = Ia> = Xf,co, 

i.e. to is an eigenvector of I with eigenvalue 2^ ; 
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(iii) referred to these axes as coordinate axes, the inertia tensor is diagonal 
with diagonal entries Xu fa, fa. 

Two further examples of physical quantities represented by second-order tensors 
are magnetic susceptibility and electrical conductivity. In the first case we have 
(in standard notation) 

Mi = xtjHj, ( 21 . 42 ) 

and in the second case 

ji = <t,jEj. ( 21 . 43 ) 

Here M is the magnetic moment per unit volume and j the current density 
(current per unit perpendicular area). In both cases we have on the left-hand side 
a vector and on the right-hand side the contraction of a set of quantities with 
another vector. Each set of quantities must therefore form the components of a 
second-order tensor. 

For isotropic media MocH and j oc E, but for anisotropic materials such as 
crystals the susceptibility and conductivity may be different along different crystal 
axes, making Xij and cry general second-order tensors, although they are usually 
symmetric. 


► 77?e electrical conductivity a in a crystal is measured by an observer to have components 

V2 0 \ 

3 1 • (21.44) 

1 1 J 

Show that there is one direction in the crystal along which no current can flow. Does the 
current flow equally easily in the two perpendicular directions? 


us snuwn 


K1 = 


The current density in the crystal is given by y, = a^Ej, where try, relative to the 
observer’s coordinate system, is given by (21.44). Since [try] is a symmetric matrix, it 
possess three mutually perpendicular eigenvectors (or principal axes) with respect to 
which the conductivity tensor is diagonal, with diagonal entries Xi,Xi,Xi, the eigenvalues 
of [try]. 

As discussed in chapter 8, the eigenvalues of [cr, ; ] are given by \a — X\\ =0. Thus we 
require 


1-2 sjl 0 

yjl 3-2 1 

0 1 1-2 


= 0, 


from which we find 


(1 - 2)[(3 - 2)(1 - 2) - 1] - 2(1 - 2) = 0. 


This simplifies to give 2 
tensor has components 


= 0, 1,4 so that, with respect to its principal axes, the conductivity 
a'j given by 

/ 4 0 0 \ 


Ky] = 


Since /' = a? E'j, we see immediately that along one of the principal axes there is no 
current flow and along the two perpendicular directions the current flows are not equal. 

◄ 
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We can extend the idea of a second-order tensor that relates two vectors to a 
situation where two physical second-order tensors are related by a fourth-order 
tensor. The most common occurrence of such relationships is in the theory of 
elasticity. This is not the place to give a detailed account of elasticity theory, 
but suffice it to say that the local deformation of an elastic body at any interior 
point P can be described by a second-order symmetric tensor called the strain 
tensor. It is given by 

1 ( dui Suj\ 

e,j 2 \8xj Ox, ) ’ 

where u is the displacement vector describing the strain of a small volume element 
whose unstrained position relative to the origin is x. Similarly we can describe 
the stress in the body at P by the second-order symmetric stress tensor p t j ; the 
quantity p,j is the x 7 -component of the stress vector acting across a plane through 
P whose normal lies in the x, -direction. A generalisation of Hooke’s law then 
relates the stress and strain tensors by 

Pij = c ijk ie k i (21.45) 

where c ijk i is a fourth-order Cartesian tensor. 


► Assuming that the most general fourth-order isotropic tensor is 

Cjj k i = M,jdu + t]5ik5ji T vduSjk, (21.46) 

find the form of ( 21.45 ) for an isotropic medium having Young's modulus E and Poisson’s 
ratio a. 


For an isotropic medium we must have an isotropic tensor for c ljk i, and so we assume the 
form (21.46). Substituting this into (21.45) yields 

Pit = Xdije kk + petj + ve jt . 

But Cij is symmetric, and if we write r\ + v = 2p, then this takes the form 

Pij 2 e kk Sjj + 2/re,-;, 

in which 2 and p are known as Lame constants. It will be noted that if e,y = 0 for i f j 
then the same is true of p t j, i.e. the principal axes of the stress and strain tensors coincide. 

Now consider a simple tension in the xi-direction, i.e. p n = S but all other p t j = 0. 
Then denoting e kk (summed over k) by 9 we have, in addition to e j; - = 0 for i f j, the three 
equations 

S = X0 T 2juen, 

0 = X6 -(- 2pe22i 
0 = X6 T Xpe^j. 


Adding them gives 


S = 0(32 + 2 p). 


Substituting for 0 from this into the first of the three, and recalling that Young's modulus 
is defined by S = Eeu, gives E as 


p( 32 + 2p) 
2 + /( 


(21.47) 
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Further, Poisson’s ratio is defined as a = —eji/en (or —e^/en) and is thus 

l\ 16 _ f J_\ f_l_\ Ee u _ 2 

2p \eiiy \2p) 32 + 2/( 2(1 + p)' 


,<?ii2 4“ V e n/ V 2 /V 32 + 2/( 
Solving (21.47) and (21.48) for 2 and p gives finally 

aE . E 


Pij = 


(1 + cr)(l — 2cr) 


Bkk^ij + 


(1 +cr) 


(21.48) 


21.13 Integral theorems for tensors 

In chapter 11, we discussed various integral theorems involving vector and scalar 
fields. Most notably, we considered the divergence theorem, which states that, for 
any vector field a, 

J y adV = j a hdS, (21.49) 

where S is the surface enclosing the volume V and n is the outward-pointing unit 
normal to S at each point. 

Writing (21.49) in subscript notation, we have 


f -r— dV = d> abhi- dS. 
I v dx k * 


(21.50) 


Although we shall not prove it rigorously, (21.50) can be extended in an obvious 
manner to relate integrals of tensor fields, rather than just vector fields, over 
volumes and surfaces, with the result 

J rv r; d y = j> T ij ... k ... m h k ds. 

This form of the divergence theorem for general tensors can be very useful in 
vector calculus manipulations. 


►d vector field a satisfies V ■ a = 0 inside some volume V and a • n = 0 on the bound- 
ary surface S. By considering the divergence theorem applied to T t j = XjOj, show that 
f v ndV = 0. 

Applying the divergence theorem to T, ; - = XjUj we find 

f S IldV= [ 8 ±p±dV=<( Xi a j h j dS = 0, 

Jv oXj Jv dxj J s 

since ajhj = 0. By expanding the volume integral we obtain 

[ d ApldV= f w~ a j dV + ftip-dV 
Jv dxj Jydxj 1 Jv 8xj 


L 


= I SijCijdV 


L 


= / a, dV = 0, 


where in going from the first to the second line we used the fact that dxt/dxj = 5 t j and 
daj/dxj = 0 . ◄ 
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The other integral theorems discussed in chapter 11 can be extended in a 
similar way. For example, written in tensor notation Stokes’ theorem states that, 
for a vector field a,-, 


f 8ak '' 


<j> cikdxk. 


For a general tensor field this has the straightforward extension 


~‘jk 


dT h , 


dxj 


-hi dS = 




dxi. 


21.14 Non-Cartesian coordinates 

So far we have restricted our attention to the study of tensors when they are 
described in terms of Cartesian coordinates and the axes of coordinates are rigidly 
rotated, sometimes together with an inversion of axes through the origin. In the 
remainder of this chapter we shall extend the concepts discussed in the previous 
sections by considering arbitrary coordinate transformations from one general 
coordinate system to another. Although this generalisation brings with it several 
complications, we shall find that many of the properties of Cartesian tensors 
are still valid for more general tensors. Before considering general coordinate 
transformations, however, we begin by reminding ourselves of some properties of 
general curvilinear coordinates, as discussed in chapter 10. 

The position of an arbitrary point P in space may be expressed in terms of the 
three curvilinear coordinates u\,U 2 ,uj,. We saw in chapter 10 that if r(«i,U2, U3) is 
the position vector of the point P then at P there exist two sets of basis vectors 

dr 

e ; = — - and e,=V«;, (21.51) 

Buj 

where /' = 1,2,3. In general, the vectors in each set neither are of unit length nor 
form an orthogonal basis. However, the sets e, and e, are reciprocal systems of 
vectors and so 


e, • ej = Sij. (21.52) 

In the context of general tensor analysis, it is more usual to denote the second 
set of vectors c,- in (21.51) by e', the index being placed as a superscript to 
distinguish it from the (different) vector e,-, which is a member of the first set in 
(21.51). Although this positioning of the index may seem odd (not least because 
of the possibility of confusion with powers) it forms part of a slight modification 
to the summation convention that we will adopt for the remainder of this chapter. 
This is as follows: any lower-case alphabetic index that appears exactly twice in 
any term of an expression, once as a subscript and once as a superscript, is to be 
summed over all the values that an index in that position can take (unless the 
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contrary is specifically stated). All other aspects of the summation convention 
remain unchanged. 

With the introduction of superscripts, the reciprocity relation (21.52) should be 
rewritten so that both sides of (21.53) have one subscript and one superscript, i.e. 
as 

e, • e ; = 5{. (21.53) 

The alternative form of the Kronecker delta is defined in a similar way to 
previously, i.e. it equals unity if i = j and is zero otherwise. 

For similar reasons it is usual to denote the curvilinear coordinates themselves 
by u l ,u 2 ,n 2 , with the index raised, so that 

e j = and e ! = Vw'. (21.54) 

du 1 

From the first equality we see that we may consider a superscript that appears in 
the denominator of a partial derivative as a subscript. 

Given the two bases e,- and e 1 , we may write a general vector a equally well in 
terms of either basis as follows: 

a = a l e i + ere 2 + a 3 e 3 = a'e,-; 
a = cqe 1 + ere 1 + cqe 3 = a,e'. 

The a 1 are called the contravariant components of the vector a and the a t 
the covariant components, the position of the index (either as a subscript or 
superscript) serving to distinguish between them. Similarly, we may call the e,- the 
covariant basis vectors and the e 1 the contravariant ones. 

► S/iow that the contravariant and covariant components of a vector a are given by a' = ae' 
and a, = a ■ e,- respectively. 

For the contravariant components, we find 

a ■ e' = afj ■ e‘ = a-'dj = a\ 

where we have used the reciprocity relation (21.53). Similarly, for the covariant components, 

a ■ e, = cije 1 ■ e, = ap 5/ = a,. ◄ 

The reason that the notion of contravariant and covariant components of 
a vector (and the resulting superscript notation) was not introduced earlier is 
that for Cartesian coordinate systems the two sets of basis vectors e,- and e 1 are 
identical and, hence, so are the components of a vector with respect to either 
basis. Thus, for Cartesian coordinates, we may speak simply of the components 
of the vector and there is no need to differentiate between contravariance and 
covariance, or to introduce superscripts to make a distinction between them. 

If we consider the components of higher-order tensors in non-Cartesian co- 
ordinates, there are even more possibilities. As an example, let us consider a 
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second-order tensor T. Using the outer product notation in (21.23), we may write 
T in three different ways: 

T = T' 7 e, ® e 7 = The,- ® e 7 = Ty-e' ® e 7 , 

where T’ 7 , T'j and Ty are called the contravariant, mixed and covariant com- 
ponents of T respectively. It is important to remember that these three sets of 
quantities form the components of the same tensor T but refer to different (tensor) 
bases made up from the basis vectors of the coordinate system. Again, if we are 
using Cartesian coordinates then all three sets of components are identical. 

We may generalise the above equation to higher-order tensors; then com- 
ponents carrying only superscripts or only subscripts are referred to as the 
contravariant and covariant components respectively and all others are called 
mixed components. 


21.15 The metric tensor 

Any particular curvilinear coordinate system is completely characterised at each 
point in space by the nine quantities 

gij = e, • e 7 , (21.55) 

which, as we will show, are the covariant components of a symmetric second-order 
tensor g called the metric tensor. 

Since an infinitesimal vector displacement can be written as dr = dw ! e,-, we find 
that the square of the infinitesimal arc length (ds) 2 can be written in terms of the 
metric tensor as 


(ds) 2 = dr ■ dr = du'e, ■ du 'ej = gy du'du 7 . 


(21.56) 


It may further be shown that the volume element dV is given by 

dV — sfg da 1 du 2 da 2 , (21.57) 

where g is the determinant of the matrix [gy], which has the covariant components 
of the metric tensor as its elements. 

If we compare equations (21.56) and (21.57) with the analogous ones in section 
10.10 then we see that in the special case where the coordinate system is orthogonal 
(so that e,- • e 7 = 0 for i ^ j ) the metric tensor can be written in terms of the 
coordinate-system scale factors i — 1, 2, 3 as 


gij = 


h 2 i = j, 
0 i ^ j. 


Its determinant is then given by g = h\h\h 2 . 
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► Calculate the elements g t j of the metric tensor for cylindrical polar coordinates. Hence 
find the square of the infinitesimal arc length (ds) 2 and the volume dV for this coordinate 
system. 


As discussed in section 10.9, in cylindrical polar coordinates (m 1 , u 2 ,u 3 ) = (p,4>,z) and so 
the position vector r of any point P may be written 

r = p cos i + p sin tp j + z k. 

From this we obtain the (covariant) basis vectors: 

dr 

e i = -r- 
8p 

dr 

If 

dr 

ei = dz 

Thus the components of the metric tensor [g j; ] = [e,- ■ e,] are found to be 

1 0 0 \ 

0 p 2 0 , (21.59) 

0 0 1 J 

from which we see that, as expected for an orthogonal coordinate system, the metric tensor 
is diagonal, the diagonal elements being equal to the squares of the scale factors of the 
coordinate system. 

From (21.56), the square of the infinitesimal arc length in this coordinate system is given 
by 

(ds) 2 = gij du' dif = (dp) 2 + p 2 (df) 2 + (dz) 2 , 
and, using (21.57), the volume element is found to be 

dV = ftgdu 1 du 2 du 3 = pdpdf dz. 

These expressions are identical to those derived in section 10.9. ◄ 

We may also express the scalar product of two vectors in terms of the metric 
tensor: 

a • b = fl'e,- • (He,- = gjja'bf (21.60) 

where we have used the contravariant components of the two vectors. Similarly, 
using the covariant components, we can write the same scalar product as 

a • b = a,e' • bje^ = g’’ afij, (21.61) 

where we have defined the nine quantities g' ; = e' ■ e 7 . As we shall show, they form 
the contravariant components of the metric tensor g and are, in general, different 
from the quantities gy. Finally, we could express the scalar product in terms of 
the contravariant components of one vector and the covariant components of the 
other, 

a b = a,e' • = afi^d'j = aft, (21.62) 


G = [g/j] = 


= cos (j) i + sin <^> j ; 

= — p sin 0 i + p cos(/) j; 

= k. (21.58) 
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where we have used the reciprocity relation (21.53). Similarly, we could write 

a b = fl'e,- • b ; -e ; = a'bjdj = a'bj. (21.63) 

By comparing the four alternative expressions (21.60) (21.63) for the scalar 
product of two vectors we can deduce one of the most useful properties of 
the quantities gy and g y . Since gy-a'b 7 = a'bj holds for any arbitrary vector 
components a 1 , it follows that 

gijb J = b h 

which illustrates the fact that the covariant components gy of the metric tensor 
can be used to lower an index. In other words, it provides a means of obtaining 
the covariant components of a vector from its contravariant components. By a 
similar argument, we have 

g ij bj = b\ 

so that the contravariant components g y can be used to perform the reverse 
operation of raising an index. 

It is straightforward to show that the contravariant and covariant basis vectors, 
e' and e ; respectively, are related in the same way as other vectors, i.e. by 

e' = g y e ; - and e, = gye 7 . 

We also note that, since e, and e' are reciprocal systems of vectors in three- 
dimensional space (see chapter 7), we may write 

■ e i x 

e = 

e, ■ (e 7 x e/j’ 

for the combination of subscripts i,j,k = 1,2,3 and its cyclic permutations. A 
similar expression holds for e, in terms of the e ! -basis. Moreover, it may be shown 
that the triple scalar product |ei • (e 2 x e 3 )| = ^Jg. 


► Show that the matrix [g' 7 ] is the inverse of the matrix [gy]. Hence calculate the con- 
travariant components g' 7 of the metric tensor in cylindrical polar coordinates. 


Using the index-lowering and index-raising properties of gy and g‘ 7 on an arbitrary vector 
a, we find 

5' k a k = a‘ = g^aj = g ll g jk a k . 

But, since a is arbitrary, we must have 

g iJ gjk = 4- (21-64) 

Denoting the matrix [gy] by G and [g' J ] by G, equation (21.64) can be written in matrix 
form as GG = I, where I is the unit matrix. Hence G and G are inverse matrices of each 
other. 
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Thus, by inverting the matrix G in (21.59), we find that the elements g‘ J are given in 
cylindrical polar coordinates by 

(l 0 0 

G = [g"] = 0 1/p 2 0 

\ 0 0 1 

So far we have not considered the components of the metric tensor g' with one 
subscript and one superscript. By analogy with (21.55), these mixed components 
are given by 

S'j = e ' • e ; = 

and so the components of g'- are identical to those of <3j. We may therefore 
consider the d' to be the mixed components of the metric tensor g. 



21.16 General coordinate transformations and tensors 

We now discuss the concept of general transformations from one coordinate 
system, u 1 ,u 2 ,u i , to another, u , w' 2 , i/ 3 . We can describe the coordinate transform 
using the three equations 

u" = u"(u l ,tr, m 3 ), 

for i = 1, 2, 3, in which the new coordinates u" can be arbitrary functions of the old 
ones u‘ rather than just represent linear orthogonal transformations (rotations) 
of the coordinate axes. We shall assume also that the transformation can be 
inverted, so that we can write the old coordinates in terms of the new ones as 

i is '1 /2 /3\ 

U = U (U ,U ,U ), 

As an example, we may consider the transformation from spherical polar to 
Cartesian coordinates, given by 

x = r sin 9 cos </>, 
y = r sin 9 sin </>, 
z = rcosd, 

which is clearly not a linear transformation. 

The two sets of basis vectors in the new coordinate system, u n ,u' 2 ,u’ 3 , are given 
as in (21.54) by 

e' = ~~ ~r and e" = Vu" . (21.65) 

du n 

Considering the first set, we have from the chain rule that 

dr du” dr 

dui dui du' 1 ’ 
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so that the basis vectors in the old and new coordinate systems are related by 


du " 


e,- = - — re,. 
1 duJ ' 


( 21 . 66 ) 


Now, since we can write any arbitrary vector a in terms of either basis as 

, ; jSu" , 

a = a e, = a ] ej = a 1 — e„ 

it follows that the contravariant components of a vector must transform as 


a 


ri 


du 

= -r — -a 
du J 


i 


(21.67) 


In fact, we use this relation as the defining property for a set of quantities a' to 
form the contravariant components of a vector. 


►find an expression analogous to (21.66) relating the basis vectors e' and e" in the two 
coordinate systems. Hence deduce the way in which the covariant components of a vector 
change under a coordinate transformation. 


If we consider the second set of basis vectors in (21.65), e" = Vw", we have from the chain 
rule that 


du ’ du' 1 
dx du" dx 


and similarly for du j /dy and du* fdz. So the basis vectors in the old and new coordinate 
systems are related by 


For any arbitrary vector a, 



( 21 . 68 ) 


a = a'e" = cijt 1 = a,- — re" 

' ; ; du" 

and so the covariant components of a vector must transform as 


d = — (21.69) 
’ du" 1 ’ 

Analogously to the contravariant case (21.67), we take this result as the defining property 
of the covariant components of a vector. ◄ 


We may compare the transformation laws (21.67) and (21.69) with those for 
a first-order Cartesian tensor under a rigid rotation of axes. Let us consider 
a rotation of Cartesian axes x‘ through an angle 6 about the 3-axis to a new 
set x", i = 1,2,3, as given by (21.7) and the inverse transformation (21.8). It is 
straightforward to show that 

dxJ _ dx " _ 

Ihf ~~dxj ~ iJ ’ 
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where the elements L i; - are given by 


L = 


cos 8 sin# 0 \ 

— sin 6 cos 6 0 

0 0 1 / 


Thus (21.67) and (21.69) agree with our earlier definition in the special case of a 
rigid rotation of Cartesian axes. 

Following on from (21.67) and (21.69), we proceed in a similar way to de- 
fine general tensors of higher rank. For example, the contravariant, mixed and 
covariant components, respectively, of a second-order tensor must transform as 
follows : 


contravariant components, 
mixed components, 
covariant components, 


T nj = da" du' J Tkl _ 

du k du l 

i 8u" du l 

1 8u k du' J 

. 8u k du l m 

T ij = —j—jT k i. 
du du' 1 


It is important to remember that these quantities form the components of the 
same tensor T but refer to different tensor bases made up from the basis vectors 
of the different coordinate systems. For example, in terms of the contravariant 
components we may write 


T = r'e, ® ej = T' ij e' ® e'. 


We can clearly go on to define tensors of higher order, with arbitrary numbers 
of covariant (subscript) and contravariant (superscript) indices, by demanding 
that their components transform as follows: 

r ij -k _ di/^du^ di/^du^dif_ dut b ... c 

du a du h du c du fl du' m du ,n de '" f ' 1 ' 1 

Using the revised summation convention described in section 21.14, the algebra 
of general tensors is completely analogous to that of the Cartesian tensors 
discussed earlier. For example, as with Cartesian coordinates, the Kronecker 
delta is a tensor provided it is written as the mixed tensor dj since 

,i 8u " du 1 k du " 8u k du" ^ 

7 du k du' j 1 du k du' j du’ 1 7 ’ 

where we have used the chain rule to justify the third equality. This also shows 
that S'j is isotropic. As discussed at the end of section 21.15, the dj can be 
considered as the mixed components of the metric tensor g. 
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► S/iovv that the quantities g t j = e, ■ e ; - form the covariant components of a second-order 
tensor. 

In the new (primed) coordinate system we have 

g'ij = < ■ e 'p 

but using (21.66) for the inverse transformation, we have 

, 8u k 
e = — -e k , 

' du" 

and similarly for e' . Thus we may write 

, 8u k du 1 du k 8u l 

Sii ~ 8t/ A/ 7 6k ’ C ' “ 8if 8u ;]8kl ’ 

which shows that the g jy - are indeed the covariant components of a second-order tensor 
(the metric tensor g). ◄ 


A similar argument to that used in the above example shows that the quantities 
g 77 form the contravariant components of a second-order tensor which transforms 
according to 


g 


rij 


du " du’ j kl 
8u k du 1 8 


In the previous section we discussed the use of the components gy and g 1 - 7 in 
the raising and lowering of indices in contravariant and covariant vectors. This 
can be extended to tensors of arbitrary rank. In general, contraction of a tensor 
with gij will convert the contracted index from being contravariant (superscript) 
to covariant (subscript), i.e. it is lowered. This can be repeated for as many indices 
are required. For example, 

Tq = g ik T k j — gi k gj,T kl . (21.71) 

Similarly contraction with g 77 raises an index, i.e. 

T ij = g ik T k j = g ik g j, T kl . (21.72) 


That (21.71) and (21.72) are mutually consistent may be shown by using the fact 
that g ik gk j = Sj. 


21.17 Relative tensors 

In section 21.10 we introduced the concept of pseudotensors in the context of the 
rotation (proper or improper) of a set of Cartesian axes. Generalising to arbitrary 
coordinate transformations leads to the notion of a relative tensor. 

For an arbitrary coordinate transformation from one general coordinate system 
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n' to another u", we may define the Jacobian of the transformation (see chapter 6) 
as the determinant of the transformation matrix [du"/du '] : this is usually denoted 
by 


J = 


8u' 

8u 


Alternatively, we may interchange the primed and unprimed coordinates to 
obtain \8u/8u'\ = 1/J: unfortunately this also is often called the Jacobian of the 
transformation. 

Using the Jacobian J, we define a relative tensor of weight w as one whose 
components transform as follows: 


r-pt ij"'k 
1 lm--n 


du n du'j 
8u a 8u b 


8u' k 8u d 8u e 
8u c 8u' 1 8u' m 


8u f 

foT 


- ab ■■ 


de—f 


8u 

8u! 


(21.73) 


Comparing this expression with (21.70), we see that a true (or absolute) general 
tensor may be considered as a relative tensor of weight w = 0. If w = —1, on the 
other hand, the relative tensor is known as a general pseudotensor, and if w = 1 
as a tensor density. 

It is worth comparing (21.73) with the definition (21.38) of a Cartesian pseu- 
dotensor. For the latter, we are concerned only with its behaviour under a rotation 
(proper or improper) of Cartesian axes, for which the Jacobian J = +1. Thus, 
general relative tensors of weight w = — 1 and w = 1 would both satisfy the 
definition (21.38) of a Cartesian pseudotensor. 


►//" the gjj are the covariant components of the metric tensor, show that the determinant g 
of the matrix [g,y] is a relative scalar of weight w = 2. 


The components g j; - transform as 

, 8u k du 1 
Sij = 

Defining the matrices U = [du‘/du u ], G = [g j; ] and G' = [g' J, we may write this expression 
as 

G' = U t GU. 

Taking the determinant of both sides, we obtain 



which shows that g is a relative scalar of weight w = 2. ◄ 

From the discussion in section 21.8, it can be seen that is a covariant 
relative tensor of weight —1. We may also define the contravariant tensor e'J k , 
which is numerically equal to e^Tc but is a relative tensor of weight +1. 

If two relative tensors have weights w i and w 2 respectively then, from (21.73), 
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the outer product of the two tensors, or any contraction of them, is a relative 
tensor of weight w i + W 2 . As a special case, we may use ey k and e l ' k to construct 
pseudovectors from antisymmetric tensors and vice versa, in an analogous way 
to that discussed in section 21.11. 

For example, if the T' J are the contravariant components of an antisymmetric 
tensor (w = 0) then 

Pi = \e ijk A jk 

are the covariant components of a pseudovector (w = —1), since ey* has weight 
w = — 1. Similarly, we may show that 

,r :/ = e^pk. 


21.18 Derivatives of basis vectors and Christoffel symbols 


In Cartesian coordinates, the basis vectors e, are constant and so their derivatives 
with respect to the coordinates vanish. In a general coordinate system, however, 
the basis vectors e, and e' are functions of the coordinates. Therefore, in order 
that we may differentiate general tensors we must consider the derivatives of the 
basis vectors. 

First consider the derivative dti/duK Since this is itself a vector, it can be 
written as a linear combination of the basis vectors e k , k = 1, 2, 3. If we introduce 
the symbol r*y to denote the coefficients in this combination, we have 


de ; 
8u J 



(21.74) 


The coefficient TU is the kth component of the vector de,/3 
procity relation e' 
by 


e,- = Sj, these 27 numbers are given (at each 


Using the reci- 
point in space) 



(21.75) 


Furthermore, by differentiating the reciprocity relation e' • e,- = <)'• with respect 
to the coordinates, and using (21.75), it is straightforward to show that the 
derivatives of the contravariant basis vectors are given by 


de 1 

duJ 


— r< 

- kj e ■ 


(21.76) 


The symbol r fc j; - is called a Christoffel symbol (of the second kind), but, despite 
appearances to the contrary, these quantities do not form the components of a 
third-order tensor. It is clear from (21.75) that in Cartesian coordinates rU = 0 
for all values of the indices i, j and k. 
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>-Using (21.75), deduce the way in which the quantities r* f - transform under a general 
coordinate transformation, and hence show that they do not form the components of a 
third-order tensor. 


In a new coordinate system 

T-rk rk dQj 

du’’ ’ 


but from (21.68) and (21.66) respectively we have, on reversing primed and unprimed 
variables. 


t' k = ^e» 
8u n 


and 


. du l 
e = — re/. 
' 8u" 


Therefore in the new coordinate system the quantities T'^- are given by 


r * = ^e" 
iJ 8u n 


8 

8u’i 



8u' k ( 8 2 u' 8u l dei \ 

= — e ■ ( 7 :Gl -f 7 7 I 

8u n \8u' J du' 1 8u " 8u u ) 

8u' k 8 2 u' 8u' k 8u 1 8u m „ 8e/ 

= - 7 — 7 e" • e, + 7 : e" • — — 

8u n 8u IJ 8u" 8u" 8u" Su' 1 8u m 

_ 8u' k 8 2 u‘ 8h/^8f_8ir_ 

Su 1 du'iSu" 8u" Su 11 8u'i lm ’ 

where in the last line we have used (21.75) and the reciprocity relation e" ■ e/ = 5". From 
(21.77), because of the presence of the first term on the right-hand side, we conclude 
immediately that the r k t . do not form the components of a third-order tensor. ◄ 


In a given coordinate system, in principle, we may calculate the r^ . using 
(21.75). In practice, however, it is often quicker to use an alternative expression, 
which we now derive, for the Christoffel symbol in terms of the metric tensor g, ; - 
and its derivatives with respect to the coordinates. 

Firstly we note that the Christoffel symbol T k f . is symmetric with respect to 
the interchange of its two subscripts i and j. This is easily shown : since 


<3e, _ d 2 r _ 8 2 r _ de,- 

dui duidu 1 du'dul du 1 ’ 

it follows from (21.74) that = ^ k ji e k- Taking the scalar product with e 1 and 
using the reciprocity relation e/< • e 1 = d l k gives immediately that 



To obtain an expression for T k ^ we then use gy = e, • e ; - and consider the 
derivative 


dgij _ de t 8ej 

du k du k 2 du k 

= r' ft e, • tj + e,- • r>, 

= r’ikSij + r'jkSn, (21.78) 
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where we have used the definition (21.74). By cyclically permuting the free indices 
i,j,k in (21.78), we obtain two further equivalent relations, 

8 M = r l jiglk + r , kig J , (21.79) 

and 

^-r'yg u + r>. (21.80) 


If we now add (21.79) and (21.80) together and subtract (21.78) from the result, 
we find 


dgjk dgh 
du' dui 


— r 'jigik + r 'kjgji + T l k jgu + ^\jgki — r'jkgij — r 1 ^ 
= 2r 'ijgki. 


where we have used the symmetry properties of both I . and gy. Contracting 
both sides with g mk leads to the required expression for the Christoffel symbol in 
terms of the metric tensor and its derivatives, namely 


pm j_ mk f ( ~ g jk ■ ^ gki ( gi j A 

2 \ du' dui du k ) 


(21.81) 


► Calculate the Christoffel symbols P”;- for cylindrical polar coordinates. 


We may use either (21.74) or (21.81) to calculate the T” ■ for this simple coordinate system. 
In cylindrical polar coordinates (u\u 2 , u 3 ) = (p,<j>,z), the basis vectors e, are given by 
(21.58). It is straightforward to show that the only derivatives of these vectors with respect 
to the coordinates that are non-zero are 


8e„ 1 dej, 1 

wf = -e*, -yr 2 - = -e, 

dtp p op p 

Thus, from (21.74), we have immediately that 


P — r' — 
1 12 1 21 — 


and 



r*22 — p- 


(21.82) 


Alternatively, using (21.81) and the fact that gn = 1, g 22 = p 2 , g 33 = 1 and the other 
components are zero, we see that the only three non-zero Christoffel symbols are indeed 
T 2 12 = T 2 21 and r' 22 . These are given by 


r 2 

r 1 


2 = 1 fg22 
21 2g 2 2 did 

Jl_dg22 = _ 

2gn du 1 


J_ d_ 
2 p 2 dp 


(P 2 ) = 


1 d_ 

2 dp 


(. P 2 ) 


= ~P, 


1 

P' 


which agree with the expressions found directly from (21.74) and given in (21.82 ). ◄ 
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21.19 Covariant differentiation 

For Cartesian tensors we noted that the derivative of a scalar is a (covariant) 
vector. This is also true for general tensors, as may be shown by considering the 
differential of a scalar 

3 (b 

deb = du'. 

Y du 1 

Since the du 1 are the components of a contravariant vector and d(j) is a scalar, 
we have by the quotient law, discussed in section 21.7, that the quantities dej>/du' 
must form the components of a covariant vector. 

As a second example, in Cartesian coordinates, if the v' are the contravariant 
components of a vector then the quantites dv'/dx J form the components of a 
second-order tensor. It is straightforward, however, to show that (in contrast to 
what happens in Cartesian coordinates) the differentiation of the components of 
a general tensor, other than a scalar, with respect to the coordinates does not in 
general result in the components of another tensor. 


► S/jow that, in general coordinates, the quantities dv'/dui do not form the components of 
a tensor. 


We may show this directly by considering 


/ 8v‘ \ ' 8v" 8u k 8v" 

\8uJ ) 8u'i du'* 8u k 

8u k 8 ( 8u" [ 

du,'* 8u k l du 1 


8u k du 8v‘ 8u k d 2 u" , 

du’i 8u l 8u k 8u' k 8u k du' 


(21.83) 


The presence of the second term on the right-hand side of (21.83) shows that the 8v‘/8x j 
do not form the components of a second-order tensor. This term arises because the 
‘transformation matrix’ [du" /8u J ] changes as the position in space at which it is evaluated 
is changed. This is not true in Cartesian coordinates, for which the second term vanishes, 
and 8v'/8x J is a second-order tensor. ◄ 


We may, however, use the Christoffel symbols discussed in the previous section 
to define a new covariant derivative of the components of a tensor that does 
result in the components of another tensor. 

Let us first consider the derivative of a vector v with respect to the coordinates. 
Writing the vector in terms of its contravariant components v = r'e,-, we find 


gy 

du-i 


dv‘ 


= ^ — e 


du-i 


i de t 
dui’ 


(21.84) 


where the second term arises because, in general, the basis vectors e, are not 
constant (this term vanishes in Cartesian coordinates). Using (21.74) we may 
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write 

8\ 8v‘ ,■ k 

m = sa 1 + ' r 1 ^' 

Since and fc are dummy indices in the last term on the right-hand side, we may 
interchange them to obtain 


8uJ 8u> kJ 1 \ fluJ kJ ' 


(21.85) 


The reason for the interchanging the dummy indices, as shown in (21.85), is that 
we may now factor out e ; . The quantity in parentheses is called the covariant 
derivative, for which the standard notation is 


^ s ^7 + r V’ ( 21 - 86 > 

the semicolon subscript denoting covariant differentiation. A similar short-hand 
notation also exists for the partial derivatives, a comma being used for these 
instead of a semicolon; for example, dv'/du j is denoted by v' •. In Cartesian 
coordinates all the P fc • are zero, and so the covariant derivative reduces to the 
simple partial derivative dv l /8u j . 

Using the short-hand semicolon notation, the derivative of a vector may be 
written in the very compact form 


and, by the quotient rule (section 21.7), it is clear that the i>‘ . • are the (mixed) 
components of a second-order tensor. This may also be verified directly, using 
the transformation properties of 8v'/du j and TA given in (21.83) and (21.77) 
respectively. 

In general, we may regard the v’.j as the mixed components of a second- 
order tensor called the covariant derivative of v and denoted by Vv. In Cartesian 
coordinates, the components of this tensor are just dv'/dxK 


► Calculate C.,- in cylindrical polar coordinates. 


Contracting (21.86) we obtain 




r u = r l n + r \ 2 + r 3 n = i/ P , 
r i 2i = r 1 21 + r 2 2 2 + r 3 23 = o, 
r^r^ + r^ + r^o, 
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and so 


8v p 

~Zp + 


8v * 


dv z 1 „ 

+ + ~v p 

8z p 


13,., dv* 8v z 

-~ P Tp (pv ) + ^ + 57 ' 

This result is identical to the expression for the divergence of a vector field in cylindrical 
polar coordinates given in section 10.9. This is discussed further in section 21.20. ◄ 


So far we have considered only the covariant derivative of the contravariant 
components v' of a vector. The corresponding result for the covariant components 
Vi may be found in a similar way, by considering the derivative of v = r,e' and 
using (21.76) to obtain 


_ dv i t ■* ,, 

;i “ 8uJ F ij k ' 


(21.87) 


Comparing the expressions (21.86) and (21.87) for the covariant derivative 
of the contravariant and covariant components of a vector respectively, we see 
that there are some similarities and some differences. It may help to remember 
that the index with respect to which the covariant derivative is taken (j in this 
case), is also the last subscript on the Christoffel symbol; the remaining indices 
can then be arranged in only one way without raising or lowering them. It 
remains to remember the sign difference, i.e. that for a covariant index (subscript) 
the Christoffel symbol carries a minus sign, whereas for a contravariant index 
(superscript) the sign is positive. 

Following a similar procedure to that which led to equation (21.86), we may 
obtain expressions for the covariant derivatives of higher-order tensors. 


► By considering the derivative of the second-order tensor T with respect to the coordinate 
u k ,find an expression for the covariant derivative T' 1 of its contravariant components. 


Expressing T in terms of its contravariant components, we have 

S = i (Ttfe, ® e ^ 

= ^ e .® e t + ^|5® e ^ + r7e '® lo- 
using (21.74), we can rewrite the derivatives of the basis vectors in terms of Christoffel 
symbols to obtain 

8T 8 T‘i 

8^ = ~w ei ® + T ' lr ' ikei ® + T ' le ' ® r W 

Interchanging the dummy indices i and / in the second term and j and / in the third term 
on the right-hand side, this becomes 


_(d_Il +T i T ij + r J T a 

8u k - l 8u k +1 lkI +1 ,kl 


e, ® tj. 
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where the expression in brackets is the required covariant derivative 

AT') 

T ' J ^=-M +r ^ T ' i + rJ lk r ‘- (2L88) 

Using (21.88), the derivative of the tensor T with respect to u k can now be written in terms 
of its contravariant components as 

S = r V'® e ^ 

Results similar to (21.88) may be obtained for the the covariant derivatives of 
the mixed and covariant components of a second-order tensor. Collecting these 
results together, we have 

T ii ;k = T i \ k + V lk T l i + T\ k T\ 

T i nri I T~>i nr l r -1 / r r i 

j;k ~ 1 j,k~T~ 1 Ik 1 j ~ 1 jk 1 b 

nr 1 '/ r r y'l nr 

ij;k — 1 ij,k 1 ^ 1 / j 1 1 U, 

where we have used the comma notation for partial derivatives. The position of 
the indices in these expressions is very systematic: for each contravariant index 
(superscript) on the LHS we add a term on the RHS containing a Christoffel 
symbol with a plus sign, and for every covariant index (subscript) we add a 
corresponding term with a minus sign. This is extended straightforwardly to 
tensors with an arbitrary number of contravariant and covariant indices. 

We note that the quantities T' 1 . k , T'j. k and Tjj-k are the components of the 
same third-order tensor VT with respect to different tensor bases, i.e. 

VT = T 1 ' , k ei ® e ; - ® e k = T'j. k ej ® e j ® e k — ® e j ® e k . 

We conclude this section by considering briefly the covariant derivative of a 
scalar. The covariant derivative differs from the simple partial derivative with 
respect to the coordinates only because the basis vectors of the coordinate 
system change with position in space (hence for Cartesian coordinates there is no 
difference). However, a scalar cj) does not depend on the basis vectors at all and 
so its covariant derivative must be the same as its partial derivative, i.e. 

= Hi = < 2L89 > 


21.20 Vector operators in tensor form 

In section 10.10 we used vector calculus methods to find expressions for vector 
differential operators, such as grad, div, curl and the Laplacian, in general orthog- 
onal curvilinear coordinates, taking cylindrical and spherical polars as particular 
examples. In this section we use the framework of general tensors that we have 
developed to obtain, in tensor form, expressions for these operators that are valid 
in all coordinate systems, whether orthogonal or not. 
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In order to compare the results obtained here with those given in section 
10.10 for orthogonal coordinates, it is necessary to remember that here we are 
working with the (in general) non-unit basis vectors e,- = dr/du' or e ! = V;;'. 
Thus the components of a vector v = r'e,- are not the same as the components v‘ 
appropriate to the corresponding unit basis e,. In fact, if the scale factors of the 
coordinate system are hi, i = 1,2,3, then v‘ = v' /hi (no summation over ;). 

As mentioned in section 21.15, for an orthogonal coordinate system with scale 
factors hi we have 


Sij = 


'hr if; = y, 


and g' ; = 


l/hj if; = 7, 


[0 otherwise ^0 otherwise, 

and so the determinant g of the matrix [gy] is given by g = h\h\h\. 

Gradient 

The gradient of a scalar cj) is given by 


V* - M = fjj*. 


(21.90) 

since the covariant derivative of a scalar is the same as its partial derivative. 


Divergence 

Replacing the partial derivatives that occur in Cartesian coordinates with covari- 
ant derivatives, the divergence of a vector held v in a general coordinate system 
is given by 


V • v = v' ;i 


8v ; 
du' 


+ rVjU 


k 


Using the expression (21.81) for the Christoffel symbol in terms of the metric 
tensor, we find 


r = + d JhL - d JhL \ = i ji d sn 

k ' 2 \du k du’ du 1 ) 2 8u k 


(21.91) 


The last two terms have cancelled because 

a dgfcf = uSgki = q dg k j 
8 du ■' g du 1 8 du 1, 

where in the first equality we have interchanged the dummy indices ; and /, and 
in the second equality have used the symmetry of the metric tensor. 

We may simplify (21.91) still further by using a result concerning the derivative 
of the determinant of a matrix whose elements are functions of the coordinates. 
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► Suppose A = [dij] 
show that 

■ B = [b ,k ] and that B = A *. By considering the determinant a = |A|, 

8a ,, 8aij 

T-r = ab J -z— p. 

8u k 8u k 


If we denote the cofactor of the element a j; - by A' 7 then the elements of the inverse matrix 
are given by (see chapter 8) 

b ij = -A". (21.92) 

a 

However, the determinant of A is given by 

a = djj A' 7 , 


in which we have fixed i and written the sum over j explicitly, for clarity. Partially 
differentiating both sides with respect to a t j, we then obtain 


8a 

8 


= A' 7 , 


(21.93) 


since a does not occur in any of the cofactors A‘f 

Now, if the a,j depend on the coordinates then so will the determinant a and, by the 
chain rule, we have 


8a 

8u k 


8a Sdjj 
8a t j 8u k 


= t t} 8 «L=ab»% 

8u k 8u k 


in which we have used (21.92) and (21.93). ◄ 


(21.94) 


Applying the result (21.94) to the determinant g of the metric tensor, and 
remembering both that g lk gkj = S) and that g 1 ' is symmetric, we obtain 


j 

8u k 8u k 


(21.95) 


Substituting (21.95) into (21.91) we find that the expression for the Christoffel 
symbol can be much simplified to give 


r, ; = 


1 Sg 1 d^/g 


2g 8u k jg 8u k 

Thus finally we obtain the expression for the divergence of a vector field in a 


general coordinate system as 




V*' 


(21.96) 


Laplacian 


If we replace v by V</> in V • v then we obtain the Laplacian V 2 ^. From (21.90), 
we have 


v/e' = v = V(j) = 
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and so the covariant components of v are given by a, : = d<j)/du'. In (21.96), 
however, we require the contravariant components v l . These may be obtained by 
raising the index using the metric tensor, to give 


v 


i 


= g jk Vk = g 


jk 


du k 


Substituting this into (21.96) we obtain 


v 2 <£ = 


J d_ 

y/gSuJ 



(21.97) 


► 1/se ( 21.97 ) to find the expression for V 2 (j> in an orthogonal coordinate system with scale 
factors hi, i = 1,2,3. 


For an orthogonal coordinate system ^Jg = h^h} and g'l = 1 /hj if i = j and g'l = 0 
otherwise. Therefore, from (21.97) we have 

V 2 <* = 1 8 ( WA ff!f\ 

h^h} dul l /j2 fl u j J 

which agrees with the results of section 10.10. ◄ 


Curl 

The special vector form of the curl of a vector held exists only in three dimensions. 
We therefore consider a more general form valid in higher-dimensional spaces as 
well. In a general space the operation curl v is defined by 


(Curl V)y = Vi-j-Vj-i, 


which is an antisymmetric covariant tensor. 

In fact the difference of derivatives can be simplified, since 


V UJ ' 


v J;i = 


ditj 

8ui 




dvf 

du l 




dvj dvj 
8ui du r 


where the Christoffel symbols have cancelled because of their symmetry properties. 
Thus curl v can be written in terms of partial derivatives as 


(curl v) y = 


dvj 

dul 


dty 

du 1 


Generalising slightly the discussion of section 21.17, in three dimensions we may 
associate with this antisymmetric second-order tensor a vector with contravariant 
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components, 


(V x v) 1 ' = 


1 e ij 

fc (curl v ) jk 

1 

k(Sv i _ 

1 f 

2 ^ 

\8u k 


dvk 

dui 




Jjk 


dvk , 
du-i ’ 


this is the analogue of the expression in Cartesian coordinates discussed in 
section 21.8. 


21.21 Absolute derivatives along curves 


In section 21.19 we discussed how to differentiate a general tensor with respect 
to the coordinates and introduced the covariant derivative. In this section we 
consider the slightly different problem of calculating the derivative of a tensor 
along a curve r(f) that is parameterised by some variable t. 

Let us begin by considering the derivative of a vector v along the curve. If we 
introduce an arbitrary coordinate system u' with basis vectors e ; , /' = 1,2,3, then 
we may write v = ife, and so obtain 


dy 

dt 


dv l 

~dt ei 


,de, 

dt 


dv l 

dt 


e,- + v 


i 5e,- du k _ 
du k dt 


here the chain rule has been used to rewrite the last term on the right-hand side. 
Using (21.74) to write the derivatives of the basis vectors in terms of Christoffel 
symbols, we obtain 


d\ 

dt 


dv' 

dt 


e,- ■ 


w i duk 

r > 


Interchanging the dummy indices i and j in the last term, we may factor out the 
basis vector, and we find 


dy 

dt 


( dv 1 f du k 

(li +T * V ~di 


e,. 


The term in parentheses is called the absolute (or intrinsic ) derivative of the 
components v‘ along the curve r(f)and is usually denoted by 

dv' dv' • , du k ■ du k 

t- s -r+r ik v J — = v'. k — . 

Xf At JK dt ■ k At 


St dt 
With this notation, we may write 

dy Sv 


dt 


— r = -j-e,- = v 
dt St 


du k 
: dt 


e„ 


(21.98) 
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Using the same method, the absolute derivative of the covariant components 
Vj of a vector is given by 

Svj _ du k 
St l,k dt 

Similarly, the absolute derivatives of the contravariant, mixed and covariant 
components of a second-order tensor T are 

<5 TJ u du k 

St ' k dt ’ 

ST', du k 

J_ — — nri 

dt ~ J;k dt ’ 

STji = du^_ 

St '*’ k dt 


The derivative of T along the curve r(f) may then be written in terms of, for 
example, its contravariant components as 


dT 

dt 


S T'* 

~sf { 


du k 




21.22 Geodesics 

As an example of the use of the absolute derivative, we conclude this chapter 
with a brief discussion of geodesics. A geodesic in real three-dimensional space 
is a straight line, which has two equivalent defining properties. Firstly, it is the 
curve of shortest length between two points and, secondly, it is the curve whose 
tangent vector always points in the same direction (along the line). Although 
in this chapter we have considered explicitly only our familiar three-dimensional 
space, much of the mathematical formalism developed can be generalised to more 
abstract spaces of higher dimensionality in which the familiar ideas of Euclidean 
geometry are no longer valid. It is often of interest to find geodesic curves in 
such spaces by using the defining properties of straight lines in Euclidean space. 

We shall not consider these more complicated spaces explicitly but will de- 
termine the equation that a geodesic in Euclidean three-dimensional space (i.e. 
a straight line) must satisfy, deriving it in a sufficiently general way that our 
method may be applied with little modification to finding the equations satisfied 
by geodesics in more abstract spaces. 

Let us consider a curve r(s), parameterised by the arc length s from some point 
on the curve, and choose as our defining property for a geodesic that its tangent 
vector t = dr/ds always points in the same direction everywhere on the curve, i.e. 

^ = 0. (21.99) 

ds 

(We could alternatively exploit the property that the distance between two points 
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is a minimum along a geodesic and use the calculus of variations (see chapter 22); 
this would lead to the same final result (21.100).) 

If we now introduce an arbitrary coordinate system u' with basis vectors e,, 
i = 1,2,3, then we may write t = f'e, and from (21.98) we find 


dt j du k 
ds 1 ,k ds e ' 


= 0 . 


Writing out the covariant derivative, we obtain 


dt 1 _• ; du k 


e, = 0. 


But, since td = dud /ds, it follows that the equation satisfied by a geodesic is 


d 2 u‘ ■ du ' du k 
ds 2 ds ds 


( 21 . 100 ) 


►find the equations satisfied by a geodesic ( straight line ) in cylindrical polar coordinates. 


From (21.82), the only noil-zero Christoffel symbols are r' 22 = — p and r 2 12 = T 2 21 
Thus the required geodesic equations are 


d 2 u l 

ds 2 


+ T 1 


d 2 u 2 

ds 2 


+ 2 r 2 


dit 2 du 2 

= 0 


d 2 p ( d</>\ 

ds ds 


ds 2 ^ yds ) 

du 1 du 2 

= 0 


d 2 <f> 2 dp dip 

ds ds 


ds 2 p ds ds 

d 2 u 3 

= 0 


d 2 z 

ds 2 


37-°^ 


i/p. 


21.1 


21.2 


21.23 Exercises 


(a) Show that for any general, but fixed, (f > , 


(iii,U 2 ) = (xi cos <f) — x 2 sin <j>, xi sin (f> + X 2 cos <j>) 

are the components of a first-order tensor in two dimensions. 

(b) Show that 

/ x\ XlX 2 \ 

V *1*2 A ) 

is not a (Cartesian) tensor of order 2. To establish that a single element does 
not transform correctly is sufficient. 

The components of two vectors A and B and a second-order tensor T are given 
in one coordinate system by 


A = 




T = 


2 

f 
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21.3 


21.4 


21.5 


21.6 


21.7 


21.8 


21.9 


In a second coordinate system, obtained from the first by rotation, the components 
of A and B are 



Find the components of T in this new coordinate system and hence evaluate, 
with a minimum of calculation, 


‘:j Tji, T ki T jk Tij, 


Tjk T mn T ni Tk, n . 


In section 21.3 the transformation matrix for a rotation of the coordinate axes 
was derived, and this approach is used in the rest of the chapter. An alternative 
view is that of taking the coordinate axes as fixed and rotating the components 
of the system; this is equivalent to reversing the signs of all rotation angles. 

Using this alternative view, determine the matrices representing (a) a positive 
rotation of n/A about the x-axis, and (b) a rotation of — n/A about the y-axis. 
Determine the initial vector r which, when subjected to (a) followed by (b), 
finishes at (3,2, 1). 

Show how to decompose the tensor T t j into three tensors, 


Tjj — Ujj + Vjj + Sij, 

where [7 i; - is symmetric and has zero trace, V t j is isotropic and has only three 
independent components. 

Use the quotient law discussed in section 21.7 to show that the array 


/ y 1 + z 1 — x 2 —2 xy 

-2 yx x 2 + z 2 — y 2 

y —2 zx —2 zy 


—2 xz 
—2 yz 

x 2 +y 2 -z 2 


forms a second-order tensor. 

Use tensor methods to establish the following vector identities: 


(a) (u x v) x w = (u ■ w)v — (v • w)u; 

(b) curl = 4 1 curl u + (grad <j>) x u; 

(c) div (u x v) = v • curl u — u ■ curl v ; 

(d) curl (u x v) = (v • gradju — (u ■ gradjv + u div v — v div u ; 

(e) grad i(u ■ u) = u x curlu + (u • grad)u. 

Use result (e) of the previous question and the general divergence theorem for 
tensors to show that 

j [A(A ■ dS) — = J [ A divA — Ax curl A] dV . 

A column matrix a has components a x , a y , a : and A is the matrix with elements 

d ij e ljk a k . 

(a) What is the relationship between column matrices b and c if Ab = c? 

(b) Find the eigenvalues of A and show that a is one of its eigenvectors. Explain 
why this must be so. 

Equation (21.28), 

A i, A m yA €jj k , 

is a more general form of the expression (8.47) for the determinant of a 3 x 3 
matrix A. The latter could have been written as 


|A| '4 1 1 .4/2.4 /,3 , 
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whilst the former removes the explicit mention of 1,2,3 at the expense of an 
additional Levi-Civita symbol. As stated in the footnote on p. 791, (21.28) can 
be readily extended to cover a general N x N matrix. 

Use the form given in (21.28) to prove properties (i), (iii), (v), (vi) and (vii) 
of determinants stated in subsection 8.9.1. Property (iv) is obvious by inspection. 
For definiteness take N = 3, but convince yourself that your methods of proof 
would be valid for any positive integer N. 

21.10 A symmetric second-order Cartesian tensor is defined by 

t . = A 3yy- 

1 IJ U IJ -’-'W-Aj. 

Evaluate the following surface integrals, each taken over the surface of the unit 
sphere: 

(a) J TjjdS ; (b) j T ik T kJ dS; (c) J XiT Jk dS. 

21.11 Given a non-zero vector v, find the value that should be assigned to a to make 

Pjj = aVjVj and 0 ;; = <5 f y — av t Vj 

into parallel and orthogonal projection tensors respectively, i.e. tensors that satisfy 
respectively PyVj = v u PijUj = 0 and QijVj = 0, QijUj = u,, for any vector u that 
is orthogonal to v, 

Show, in particular, that Q t j is unique, i.e. that if another tensor Tjj has the 
same properties as 2y then ( Q,j — T,j)Wj = 0 for any vector w. 

21.12 In four dimensions define second-order antisymmetric tensors F, ( - and Q t j and a 
first-order tensor S, as follows: 

(a) F 23 = Hi, Q 23 = B 1 and their cyclic permutations; 

(b) 7/4 = -A, 2,4 = Ej for i = 1,2,3; 

(c) S4 = p, Sj = Jj for i = 1,2,3. 

Then, taking ,x 4 as t and the other symbols to have their usual meanings in 
electromagnetic theory, show that the equations J] jdFjj/dxj = S t and dQj k /dxi + 
dQki/dxj + dQij/dx k = 0 reproduce Maxwell’s equations. Here i,j,k is any set of 
three subscripts selected from 1, 2, 3, 4, but chosen in such a way that they are 
all different. 

21.13 I11 a certain crystal the unit cell can be taken as six identical atoms lying at the 
comers of a regular octahedron. Convince yourself that these atoms can also be 
considered as lying at the centres of the faces of a cube and hence that the crystal 
has cubic symmetry. Use this result to prove that the conductivity tensor for the 
crystal, cr j; -, must be isotropic. 

21.14 Assuming that the current density j and the electric field E appearing in equation 
(21.43) are first-order Cartesian tensors, show explicitly that the electrical con- 
ductivity tensor Oij transforms according to the law appropriate to a second-order 
tensor. 

The rate W at which energy is dissipated per unit volume, as a result of the 
current flow, is given by E • j. Determine the limits between which W must lie for 
a given value of |E| as the direction of E is varied. 

21.15 In a certain system of units the electromagnetic stress tensor M jy - is given by 

Mjj = E t Ej + BjBj — 5 5ij(E k E k + B k B k ), 

where the electric and magnetic fields, E and B, are first-order tensors. Show that 
Mjj is a second-order tensor. 

Consider a situation in which |E| = |B| but the directions of E and B are 
not parallel. Show that E + B are principal axes of the stress tensor and find 
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the corresponding principal values. Determine the third principal axis and its 
corresponding principal value. 

21.16 A rigid body consists of four particles of masses m, 2 m, 3 in, 4 m, respectively 
situated at the points (a, a, a), (a,— a,— a), (—a, a,— a), (—a,— a, a) and connected 
together by a light framework. 

(a) Find the inertia tensor at the origin and show that the principal moments of 
inertia are 20mcr, and (20 + 2^/5 )ma 2 . 

(b) Find the principal axes and verify that they are orthogonal. 

21.17 A rigid body consists of eight particles, each of mass m, held together by light 
rods. In a certain coordinate frame the particles are at 

+a(3, 1,-1), +a(l,— 1,3), a(l,3,-l), a(-l,l,3). 

Show that, when the body rotates about an axis through the origin, if the angular 
velocity and angular momentum vectors are parallel then their ratio must be 
40 /jiu 2 , 64 ma 2 or lima 2 . 

21.18 The paramagnetic tensor xp of a body placed in a magnetic field, in which its 
energy density is — f/ioM ■ H with M,- = JA XijHj, is 


f 2k 0 0 \ 

0 3 k k 

y 0 k 3 k ) 


Assuming depolarizing effects are negligible, find how the body will orientate 
itself if the field is horizontal, in the following circumstances: 

(a) the body can rotate freely; 

(b) the body is suspended with the (1,0,0) axis vertical; 

(c) the body is suspended with the (0, 1,0) axis vertical. 

21.19 A block of wood contains a number of thin soft iron nails (of constant permeabil- 
ity). A unit magnetic field directed eastwards induces a magnetic moment in the 
block having components (3, 1,-2) and similar fields directed northwards and 
vertically upwards induce moments (1,3, —2) and (—2,2,2) respectively. Show 
that all the nails lie in parallel planes. 

21.20 For tin the conductivity tensor is diagonal, with entries a, a, and b when referred 
to its crystal axes. A single crystal is grown in the shape of a long wire of length L 
and radius r, the axis of the wire making polar angle 6 with respect to the crystal’s 
3-axis. Show that the resistance of the wire is L(7rr 2 ab) _1 (a cos 2 8 + b sin 2 8) . 

21.21 By considering an isotropic body subjected to a uniform hydrostatic pressure 
(no shearing stress), show that the bulk modulus k, defined by the ratio of the 
pressure to the fractional decrease in volume, is given by k = E/[3(l— 2cr )] where 
E is Young’s modulus and a Poisson's ratio. 

21.22 For an isotropic elastic medium under dynamic stress, at time t the displacement 
itj and the stress tensor satisfy 


Pij — Cijkl 


( Su k du, \ 
V 8xi 8x k ) 


and 


8pij 8 2 Uj 

dxj ^ 8t 2 


where c i;W is the isotropic tensor given in equation (21.46) and p is a constant. 
Show that both V ■ u and V x u satisfy wave equations and find the corresponding 
wave speeds. 
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21.23 


A fourth-order tensor Tyt; has the properties 


Tjiki — ■ 


ijkh 


Tijik — —Tjjki- 


Prove that for any such tensor there exists a second-order tensor K mn such that 

Tijkl ^ijm^klnK-nm 

and give an explicit expression for K m „. Consider two (separate) special cases, as 
follows. 


(a) Given that Tjj k i is isotropic and T j;;i = 1, show that T^i is uniquely deter- 
mined and express it in terms of Kronecker deltas. 

(b) If now T^ki has the additional property 

Tkiij = —Tjjki, 

show that Tjjki has only three linearly independent components and find an 
expression for Tjjki in terms of the vector 

Vi = — \ejkiTjjki . 

21.24 Working in cylindrical polar coordinates p,<f>,z, parameterise the straight line 
(geodesic) joining (1,0,0) to ( 1, tt/ 2, 1) in terms of s, the distance along the line. 
Show by substitution that the geodesic equations derived at the end of section 
21.22 are satisfied. 

21.25 In a general coordinate system u\ i = 1,2,3, in three-dimensional Euclidean 
space, a volume element is given by 

dV = |ej du 1 ■ (e 2 du 2 x e 3 du 3 ) |. 

Show that an alternative form for this expression, written in terms of the deter- 
minant g of the metric tensor, is given by 

dV = ^jg du 1 du 2 du 3 . 

Show that under a general coordinate transformation to a new coordinate system 
u" the volume element dV remains unchanged, i.e. show that it is a scalar quantity. 


21.26 


21.27 


By writing down the expression for the square of the infinitesimal arc length (ds) 2 
in spherical polar coordinates, find the components g, ; - of the metric tensor in this 
coordinate system. Hence, using (21.96), find the expression for the divergence 
of a vector field v in spherical polars. Calculate the Christoffel symbols (of the 
second kind) TC in this coordinate system. 

Find an expression for the second covariant derivative v i; j k = (vjj) ; k of a vector 
Vi (see(21.86)). By interchanging the order of differentiation and then subtracting 
the two expressions, we define the components R 1 ijk of the Riemann tensor as 

r 1 /; jk Vj;kj — R ijk^l- 


Show that in a general coordinate system u‘ these components are given by 


R iilr = 


dui 


8u k 


I r->m r->f -pm p/ 

1 * ik * mj A ij* t, 


By first considering Cartesian coordinates, show that all the components R 1 ijk = 0 
for any coordinate system in three-dimensional Euclidean space. 

In such a space, therefore, we may change the order of the covariant derivatives 
without changing the resulting expression. 
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21.28 


21.29 


21.1 

21.2 

21.3 

21.4 

21.5 

21.6 

21.7 

21.8 


A curve r(f) is parameterised by a scalar variable t. Show that the length of the 
curve between two points, A and B, is given by 


L = 


du l dui 

gij lF~dT dt ' 


Using the calculus of variations (see chapter 22), show that the curve r(f) that 
minimises L satisfies the equation 

d 2 u' du* du k s du‘ 

dt 2 ’ k dt dt s dt ’ 

where s is the arc length along the curve, s = ds/dt and s = drs/dt 2 . Hence, show 
that if the parameter f is of the form t = as + b, where a and b are constants, 
then we recover the equation for a geodesic (21.100). 

(A parameter which, like f, is the sum of a linearly transformation of s and a 
translation is called an affine parameter.) 

We may define Christoffel symbols of the first kind by 

^ijk = giffi jk- 

Show that these are given by 

1 ( Sg ik dg jk 8gij\ 

lJk 2 \ dui du' du k ) 


By permuting indices, verify that 

1 p | ■ 

du k ~ Tiik + Tjik ' 

Using the fact that T l , k = r l kj , show that 

g a-,k = 0, 

i.e. that the covariant derivative of the metric tensor is identically zero in all 
coordinate systems. 


21.24 Hints and answers 


(a) u\ = xi cos (cj) — 0) — xj sm(4> — d ), etc.; 

(b) u! n = s 2 x\ — 2 scx\X 2 + c 2 x \ c 2 x \ + csx 1 .X 2 + sex 1 X 2 + s 2 xj. 

Determine entries for the third column of L by requiring that it is orthogonal 
and has determinant +1. T = i(^/3, — 1,0;0,0,— 2; 1^3,0). They are all scalars 
with values 30, 134, 642. 

(a) (1/V2)(V2, 0,0;0, 1,-1; 0,1,1). (b) (1/^2)(1, 0,-1; 0, ^2,0; 1,0, 1). 
r = (2^/2, -1 + V2, -1-V2) T 

If r 0 is Tr Tij then U tJ = ±(T tJ + T n ) - | T 0 S,j, Vtj = | T 0 d t j, S,j = ±{T,j - Tj,). 
Twice contract the array with the outer product of ( x,y,z ) with itself to obtain 
the expression — (x 2 + y 2 + z 2 ) 2 , which is an invariant and therefore a scalar. 

(a) e ijk e jlm u,v m w k and use (21.29); (b) e ijk d{<j>u k )/dxj\ (c) d(e ijk UjV k )/dxi ; 

(d) ejj k € k i m d (uiv m )/ dxj and use (21.29); (e) start with u x curlu and obtain 


^ i jkUj€kl m 




Write Aj(dAi/dxj) as d{AjAj)/dxj — Ai(dAj/dxj). 

(a) c = a x b. (b) 0, +i|a|. Aa = 0a since a x a = 0. 
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21.9 (i) Write out the expression for |A T |, contract both sides of the equation with e/ mn 
and pick out the expression for |A| on the RHS. Note that ei mn e lm „ is a numerical 
scalar. 

(iii) Each non-zero term on the REIS contains any particular row index once and 
only once. The same can be said for the Levi-Civita symbol on the LE1S. Thus 
interchanging two rows is equivalent to interchanging two of the subscripts of 
ei m „ and thereby reversing its sign. Consequently, the magnitude of |A| remains 
the same but its sign is changed. 

(v) If, say, A pi = XA p j, for some particular pair of values i and j and all p then, 
in the (multiple-) summation on the RHS, each A nk appears multiplied by (no 
summation over i and j) 

€ ijkAliA m j + 6jj k AijA m j = 6jj k XAijA ln j T 6 ji k AijXA m j — 0, 

since e ijk = —ej ik . Consequently, grouped in this way all terms are zero and 
|A|=0. 

(vi) Replace A m j by A mj + XA/j and note that XAuA/jA nk ejj k = 0 by virtue of 
result (v). 

(vii) If C = AB, 

C | Ai x B X jA m yByjA nz B z i i 6jj k . 

Contract this with e /m „ and show that the RHS is equal to e X yz|A T |e X j, z |B|. It then 
follows from result (i) that |C| = |A||B|. 

21.10 Note that f x t dS = f(x/) 3 dS = 0 and that f(Xi) 2 dS = An/3. (a) 0 (the two 
contributions cancel when i = j); (b) 871 <5, j; (c) 0 for all sets of i,j,k , whether or 
not some or all are equal. 

21.11 a = |v|~ 2 . Note that the most general vector has components w,- = Xvi+pu^+\uf\ 
where both u (1) and u (2) are orthogonal to v. 

21.12 VxH = J + D;V-D = p;VxE + B = 0; V B = 0. 

21.13 Construct the orthogonal transformation matrix S for the symmetry operation 
of (say) a rotation of 2n/3 about a body diagonal and, setting L = S _1 = S T , 
construct a' = LctL t and require a' = a. Repeat the procedure for (say) a rotation 
of n/2 about the xi-axis. These together show that an = <? 22 = C 33 and that 
all other a t j = 0. Further symmetry requirements do not provide any additional 
constraints. 

21.14 W = Ei<jjjEj has to be maximised or minimised subject to E,E, being held 
constant. Extreme values are W+ = A+|E| 2 , where A+ are the maximum and 
minimum eigenvalues of the matrix a,j. 

21.15 The transformation of 5^ has to be included; the principal values are +E ■ B. 
The third axis is in the direction +B x E with principal value — |E| 2 . 

21.16 (b) xj = (2 -1 0), xj = (l 2 y/5), xj = (1 2 -75). 

21.17 The principal moments give the required ratios. 

21.18 The principal susceptibilities and (unnormalised) axes are X = 4, +(0, 1, 1); 

2 = 2, +(Cj, 1, — 1) with C\C 2 = —2, leading to: 

(a) lowest energy when (0, 1, 1) axis is parallel to the field; 

(b) permitted values of orientation are (0, n 3 ), hence as in (a); 

(c) permitted values of orientation are (m, 0 , 713 ), subject to n\ + nj = 1. 

The energy = —ijfi 0 kH 2 V(2nl + 3 n 2 ), which is minimised when (0,0, 1) is parallel 
to the field. 

21.19 The principal permeability, in direction (1, 1,2), has value 0. Thus all the nails lie 
in planes to which this is the normal. 

21.20 ji = a ik E k gives / sin 8 cos <j> = anr 2 Ei, I sin 6 sin <j> = anr 2 E 2 , I cos 9 = bnr 2 E 2 . 
Also V/L = E 1 sin 0 cos <j> + E 2 sin 8 sin 0 + £3 cos 9. The current must flow along 
the wire; E is not parallel to the wire. 
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21.21 

21.22 


21.23 


21.24 

21.25 

21.26 

21.27 

21.28 


Take pn = pn = pn = —p, and p,j = ep = 0 for i ^ j, leading to —p = 
(A + 2/(/3)e,j. The fractional volume change is e,-,-; 2 and p are as defined in (21.45) 
and the worked example that follows it. 

Show that pij = 22<5, 7 V • u + (p + v)(8ui/dxj + duj/dxi). Form the sum of the 
derivatives Yhdd/dxj) for this equation, substitute for dpij/dxj and then form 
JT(3/( 3xi) of the result. The wave speed for V ■ u is [2(2 + p + v)/p] 1/2 . Show that 

d 2 (V X u)a, _ 8 2 pu 
^ 8t 2 k 21 dxjdxi 

and then use the previous expression for pp and the identity e^d 2 / dxjdxi = 0 . 
The wave speed for V x u is [(p + v)/p] 1/2 . 

Consider Q n = e p p£ q k\ Tpki and show that K,„„ = Q m „/A has the required property. 

(a) Argue from the isotropy of Tpu and ep k for that of K mn and hence that it 
must be a multiple of <5,„„. Show that the multiplier is uniquely determined and 
that Tiju = (SuSjk — Sikdji)/6. 

(b) By relabelling dummy subscripts and using the stated antisymmetry property, 
show that K„ m = —K m „. Show that — 2F f = e min K mn and hence that K mn = e tm „ Vp 

Tpki = SkiiVj — e k ijVi. 

p = ( 1 — 2s/^3 + 2s 2 /3) 1/2 , <j) = tan -1 [s/ (>/3 — s)], z = s/yj 3. 

Use |d • (e 2 x e 3 )| = jg. 

Recall that ^/g 7 = \8u/8u'\jg and du' 1 du' 2 du ' 3 = \8u'/du\ did du 2 did. 

g = r 4 sin 2 0 ; recall that, for each i, v‘ = vt/hi , e.g. v 3 = v^/(r sind). 

r ' 22 = — r; r ' 3 3 = — rsin 2 0 ; r 2 12 = r _1 ; r 2 32 = — sin 0 cos 0 ; T 3 13 = r _1 ; F 3 , 3 = 

cot 8. 

(Vi-.j)-k = (vpjlk - r' ikVi-j - P Jk v U i and v i;j = v uj - T m ij v m . If all components of a 
tensor equal zero in one coordinate system then they are zero in all coordinate 
systems. 

Using s = yj gijU'uJ, the Euler-Lagrange equation is 


d 

dt 



1 ^/ 
2 s 8u k 


= 0 . 


Calculate the f-derivative, write 


Sgik _ i ( dgik dg jk \ 
8ui 2 y du > 8u‘ J 

and multiply through by g ,k . If t = as + b then s = 0. 
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Calculus of variations 


In chapters 2 and 5 we discussed how to find stationary values of functions of a 
single variable / ( x ), of several variables f(x,y,...) and of constrained variables, 
where x, y,... are subject to the n constraints g,(x, y , . . . ) = 0, i= 1,2,..., n. In all 
these cases the forms of the functions / and g,- were known, and the problem was 
one of finding the appropriate values of the variables x, y etc. 

We now turn to a different kind of problem in which we are interested in 
bringing about a particular condition for a given expression (usually maximising 
or minimising it) by varying the functions on which the expression depends. For 
instance, we might want to know in what shape a fixed length of rope should 
be arranged so as to enclose the largest possible area, or in what shape it will 
hang when suspended under gravity from two fixed points. In each case we are 
concerned with a general maximisation or minimisation criterion by which the 
function y(x) that satisfies the given problem may be found. 

The calculus of variations provides a method for finding the function y(x). 
The problem must first be expressed in a mathematical form, and the form 
most commonly applicable to such problems is an integral. In each of the above 
questions, the quantity that has to be maximised or minimised by an appropriate 
choice of the function y(x) may be expressed as an integral involving y(x) and 
the variables describing the geometry of the situation. 

In our example of the rope hanging from two fixed points, we need to find 
the shape function y(x) that minimises the gravitational potential energy of the 
rope. Each elementary piece of the rope has a gravitational potential energy 
proportional both to its vertical height above an arbitrary zero level and to the 
length of the piece. Therefore the total potential energy is given by an integral 
for the whole rope of such elementary contributions. The particular function y(x) 
for which the value of this integral is a minimum will give the shape assumed by 
the hanging rope. 

So in general we are led by this type of question to study the value of an 
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22.1 THE EULER-LAGRANGE EQUATION 



Figure 22.1 Possible paths for the integral (22.1). The solid line is the curve 
along which the integral is assumed stationary. The broken curves represent 
small variations from this path. 


integral whose integrand has a specified form in terms of a certain function 
and its derivatives, and to study how that value changes when the form of 
the function is varied. Specifically, we aim to find the function that makes the 
integral stationary, i.e. the function that makes the value of the integral a local 
maximum or minimum. Note that, unless stated otherwise, y' is used to denote 
dy/dx throughout this chapter. We also assume that all the functions we need to 
deal with are sufficiently smooth and differentiable. 


22.1 The Euler-Lagrange equation 

Let us consider the integral 

1 = [ F(y,y',x)dx, (22.1) 

where a, b and the form of the function F are fixed by given considerations, 
e.g. the physics of the problem, but the curve y(x) is to be chosen so as to 
make stationary the value of I, which is clearly a function (or more accurately a 
functional) of this curve, i.e. I = /[y(.x)]. Referring to figure 22.1, we wish to find 
the function y(x) (given, say, by the solid line) such that first-order small changes 
in it (for example the two broken lines) will make only second-order changes in 
the value of I. 

Writing this in a more mathematical form, let us suppose that y(x) is the 
function required to make / stationary and consider making the replacement 

y(x) -> y(x) + cu](x), (22.2) 

where the parameter a is small and ;/(x) is an arbitrary function with sufficiently 
amenable mathematical properties. For the value of I to be stationary with respect 
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to these variations, we require 

dl 

da 


= 0 for all >/(x). 


(22.3) 


a=0 


Substituting (22.2) into (22.1) and expanding as a Taylor series in a we obtain 
r h 

I(y, a) = / F(y + arj,y' + arj',x)dx 

J a 

/ b rb / \ 

F(y,y',x)dx + J i — at] + —a r\\ dx + 0(a 2 ). 

With this form for I(y, a) the condition (22.3) implies that for all i'i(x) we require 

rh 


51 = 


r ( 8F 8F ,\ 

J, U ,+ vv ■ 


where 81 denotes the first-order variation in the value of I due to the variation 
(22.2) in the function y(x). Integrating the second term by parts this becomes 


dF 

„ 

'dF 

d 

(dF V 

rwl 

+ J 

a J a 

Jy 

dx 

{dy'J_ 


i'j(x) dx = 0. 


(22.4) 


In order to simplify the result we will assume, for the moment, that the end-points 
are fixed, i.e. not only a and b are given but also y(a) and y(b). This restriction 
means that we require rj(a) = rj(b) — 0, in which case the first term on the LHS of 
(22.4) equals zero at both end-points. Since (22.4) must be satisfied for arbitrary 
t](x), it is easy to see that we require 


(22.5) 


8F _ d /8F' 
dy dx \ 8y ’ , 

This is known as the Euler-Lagrange (EL) equation, and is a differential equation 
for y(x), since the function F is known. 


22.2 Special cases 

In certain special cases a first integral of the EL equation can be obtained for a 
general form of F. 


22.2.1 F does not contain y explicitly 
In this case dF /dy = 0, and (22.5) can be integrated immediately giving 

8F 

- — = constant. 

dy’ 


( 22 . 6 ) 
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Figure 22.2 An arbitrary path between two fixed points. 


► S/iow that the shortest curve joining two points is a straight line. 


Let the two points be labelled A and B and have coordinates {a,y(a)) and ( b,y(b )) 
respectively (see figure 22.2). Whatever the shape of the curve joining A to B, the length 
of an element of path ds is given by 

ds = \(dx) 2 + (rfy) 2 ] 1/_ = (1 + y l2 ) 1/2 dx, 
and hence the total path length along the curve is given by 

pb 

L = (1 +y' 2 ) 1/2 dx. (22.7) 

J a 

We must now apply the results of the previous section to determine that path which makes 
L stationary (clearly a minimum in this case). Since the integral does not contain y (or 
indeed x) explicitly, we may use (22.6) to obtain 

k= d l = t . 

8y’ (l+.y' 2 ) 1/2 

where k is a constant. This is easily rearranged and integrated to give 

y= A~^k2yj2 x + C ’ 

which, as expected, is the equation of a straight line in the form y = mx + c, with 
m = k/( 1 — k 2 ) 1/2 . The value of m (or k) can be found by demanding that the straight line 
passes through the points A and B and is given by m = [ y(b ) — y(aj]/(b — a). Substituting 
the equation of the straight line into (22.7) we find that, again as expected, the total path 
length is given by 

L 2 = [ y(b ) - y(a)] 2 +(b- a) 2 . ◄ 
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Figure 22.3 A convex closed curve that is symmetrical about the x-axis. 


22.2.2 F does not contain x explicitly 

In this case, multiplying the EL equation (22.5) by y' and using 

cl / ,8F\ _ , d / 8F\ „8F 

dx \' V dy' ) y dx \ dy' ) y dy' 

we obtain 


,8F „ 8F 

y ^ +y w 


- (/-) . 
dx \ dy' ) 


But since F is a function of y and y' only, and not explicitly of x, the LHS of 
this equation is just the total derivative of F, namely dF /dx. Hence, integrating 
we obtain 


F — 



constant. 


( 22 . 8 ) 


►Find the closed convex curve of length l that encloses the greatest possible area. 


Without any loss of generality we can assume that the curve passes through the origin, 
and can further suppose that it is symmetric with respect to the x-axis; this assumption 
is not essential. Using the distance s along the curve, measured from the origin, as the 
independent variable and y as the dependent one, we have the boundary conditions 
y(0) = y(l/ 2) = 0. The element of area shown in figure 22.3 is then given by 

dA = y dx = v [(ds) 2 — (dy ) 2 ] 1/2 , 

and the total area by 

ry 2 , 

A = 2 y( 1 -y’)' /2 ds; (22.9) 

Jo 

here y' stands for dy/ds rather than dy/dx. Since the integrand does not contain s explicitly, 
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we can use (22.8) to obtain a first integral of the EL equation for y, namely 
y(l - y' 2 ) 1/2 + yy' 2 ( 1 - y' 2 r 1/2 = k, 
where k is a constant. On rearranging this gives 

ky' = ±(k 2 - y 2 ) 1/2 , 

which, using y(0) = 0, integrates to 

y/k = sin(s//c). (22.10) 

The other end-point, y(l/2) = 0, fixes the value of k as l/2n to yield 

I . 2ns 

^ = 2n Sm ~T~ ’ 

From this we obtain dy = cos(27 is/l)ds and since (ds) 2 = (dx) 2 + (dy) 2 we find also that 
dx = + sin(2ns/l) ds. This in turn can be integrated and, using x(0) = 0, gives x in terms 
of s as 

I l 2ns 

x — — = — — cos — — . 

2n 2n I 

We thus obtain the expected result that x and y lie on the circle of radius l/(2n) given by 

( l \ 2 2 l 2 

2n) +y = 4^ 

Substituting the solution (22.10) into the expression for the total area (22.9), it is easily 
verified that A = l 1 /{An). A much quicker derivation of this result is possible using plane 
polar coordinates. ◄ 

The previous two examples have been carried out in some detail, even though 
the answers are more easily obtained in other ways, expressly so that the method 
is transparent and the way in which it works can be filled in mentally at almost 
every step. The next example, however, does not have such an intuitively obvious 
solution. 


► 7wo rings, each of radius a. are placed parallel with their centres 2b apart and on a 
common normal. An axially symmetric soap film is formed between them but does not cover 
the ends of the rings ( see figure 22.4 ). Find the shape assumed by the film. 


Creating the soap film requires an energy y per unit area (numerically equal to the surface 
tension of the soap solution). So the stable shape of the soap film, i.e. the one that 
minimises the energy, will also be the one that minimises the surface area (neglecting 
gravitational effects). 

It is obvious that any convex surface, shaped such as that shown as the broken line in 
figure 22.4(«) cannot be a minimum but it is not clear whether some shape intermediate 
between the cylinder shown by solid lines in (u), with area Anab (or twice this for the 
double surface of the film), and the form shown in ( b ), with area approximately 2na 2 , will 
produce a lower total area than both of these extremes. If there is such a shape (e.g. that 
in figure 22.4(c)), then it will be that which best compromises between two requirements, 
the need to minimise the ring-to-ring distance measured on the film surface (a) and the 
need to minimise the average waist measurement of the surface {b). 

We take cylindrical polar coordinates as in figure 22.4(c) and let the radius of the soap 
film at height z be p(z) with p(+b) = a. Counting only one side of the film, the element of 
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Figure 22.4 Possible soap films between two parallel circular rings. 


surface area between z and z + dz is 

dS = 2np [(dz) 2 + (dp ) 2 ] 1/2 , 
so the total surface area is given by 

nb 

S = 2n p(l+p' 2 ) 1/2 dz. (22.11) 

J-b 

Since the integrand does not contain z explicitly, we can use (22.8) to obtain an equation 
for p that minimises S, i.e. 


P( 1 + p' 2 ) 1/2 - pp' 2 (l + p 2 U 1/2 = k, 

where k is a constant. Multiplying through by (1 + p l2 ) 1/2 , rearranging to find an explicit 
expression for p’ and integrating we find 

, i p z 
cosh - = - + c. 
k k 

where c is the constant of integration. Using the boundary conditions p(+b) = a, we 
require c = 0 and k such that a/k = cosh b/k (if b/a is too large, no such k can be found). 
Thus the curve that minimises the surface area is 

p/k = cosh(z/L), 

and in profile the soap film is a catenary (see section 22.4) with the minimum distance 
from the axis equal to k. ◄ 


22.3 Some extensions 

It is quite possible to relax many of the restrictions we have imposed so far. For 
example, we can allow end-points that are constrained to lie on given curves rather 
than being fixed, or we can consider problems with several dependent and/or 
independent variables or higher-order derivatives of the dependent variable. Each 
of these extensions is now discussed. 
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22.3.1 Several dependent variables 


Here we have F = F(y l ,y[,y 2 ,y' 2 ,...,y„,y' n ,x) where each y,- = yi(x). The analysis 
in this case proceeds as before, leading to n separate but simultaneous equations 
for the yt(x). 


dF _ d f 8F\ 
dyt dx \ 8y[ ) ’ 


i = 1 , 2 ,..., n. 


( 22 . 12 ) 


22.3.2 Several independent variables 

With n independent variables, we need to extremise multiple integrals of the form 


I = 


_ , dy 8y dy , , , 

T y, — , - — , . . . , -r—,xi,X 2 , ...,X„ dx 1 dx 2 ■ ■ ■ dx„. 
OX 1 OX 2 OX n 


Using the same kind of analysis as before, we find that the extremising function 
y = y(x i,x 2 ,...,x„) must satisfy 


dF 


ur u I ur \ 

dy ^ dxj V dy Xl I ’ 


dF 


(22.13) 


where y x . stands for dy/dxj. 


22.3.3 Higher-order derivatives 


If in (22.1) F = F(y,y',y", ..., y ( "*,x) then using the same method as before 
and performing repeated integration by parts, it can be shown that the required 
extremising function y(x) satisfies 


dF^_±(SF\ d^(8F_\_ d n f dF \ 

dy dx \8y’ ) dx 2 \dy" ) dx” ) 


(22.14) 


provided that y = y' = ■ ■ ■ = y*"” 1 * = 0 at both end-points. If y, or any of its 
derivatives, is not zero at the end-points then a corresponding contribution or 
contributions will appear on the RHS of (22.14). 


22.3.4 Variable end-points 

We now discuss the very important generalisation to variable end-points. Suppose, 
as before, we wish to find the function y(x) that extremises the integral 

1=1 F (y> /> x) dx, 

J a 

but this time we demand only that the lower end-point is fixed, while we allow 
y(b) to be arbitrary. Repeating the analysis of section 22.1, we find from (22.4) 
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Figure 22.5 Variation of the end-point b along the curve h(x,y) = 0. 


that we require 


r dF 1 

b rb 

'dF 

d , 

(dF V 


+ 

a J a 

Jy 

dx 

l w). 


t;(x) dx = 0. 


(22.15) 


Obviously the EL equation (22.5) must still hold for the second term on the LHS 
to vanish. Also, since the lower end-point is fixed, i.e. 11 (a) = 0, the first term on 
the LHS automatically vanishes at the lower limit. However, in order that it also 
vanishes at the upper limit, we require in addition that 


dF 

ey 


= 0. 


(22.16) 


Clearly if both end-points may vary then 8F /dy' must vanish at both ends. 

An interesting and more general case is where the lower end-point is again 
fixed at x = a, but the upper end-point is free to lie anywhere on the curve 
h(x,y) = 0. Now in this case, the variation in the value of I due to the arbitrary 
variation (22.2) is given to first order by 


SI = 


dF 

L<¥ ,? J 


rs r' 1 P r \ 

T T FT ) *7 dx + F(b)Ax, (22.17) 


dy dx dy' J 

where Ax is the displacement in the x-direction of the upper end-point, as 
indicated in figure 22.5, and F(b) is the value of F at x = b. In order for (22.17) 
to be valid, we of course require the displacement Ax to be small. 

From the figure we see that Ay = r)(b) + y'(b) Ax. Since the upper end-point 
must lie on h(x, y) = 0 we also require that, at x = b. 


8h . 
—Ax ■ 
dx 


8b. n 

—Ay = 0, 

dy 


which on substituting our expression for Ay and rearranging becomes 

fdh , dh \ dh 

• / -.. Ax j .»-(). 


\dx dy 


dy 


(22.18) 
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Figure 22.6 A frictionless wire along which a small bead slides. We seek the 
shape of the wire that allows the bead to travel from the origin 0 to the line 
x = xq in the least possible time. 


Now, from (22.17) the condition <57 = 0 requires, besides the EL equation, that 
at x = b, the other two contributions cancel, i.e. 


FAx + — i/=0. (22.19) 

dy 

Eliminating Ax and t; between (22.18) and (22.19) leads to the condition that at 
the end-point 



8h 

dy 


dF dh 
dy' 8x 


( 22 . 20 ) 


In the special case where the end-point is free to lie anywhere on the vertical line 
x = b, we have 8h/8x = 1 and 8h/8y = 0. Substituting these values into (22.20), 
we recover the end-point condition given in (22.16). 


► A frictionless wire in a vertical plane connects two points A and B, A being higher than B. 
Let the position of A be fixed at the origin of an xy-coordinate system, but allow B to lie 
anywhere on the vertical line x = xo ( see figure 22.6 ). Find the shape of the wire such that 
a bead placed on it at A will slide under gravity to B in the shortest possible time. 


This is a variant of the famous brachistochrone (shortest time) problem, which is often 
used to illustrate the calculus of variations. Conservation of energy tells us that the particle 
speed is given by 

ds 



where s is the path length along the wire and g is the acceleration due to gravity. Since 
the element of path length is ds = (1 + y l2 )' /2 dx, the total time taken to travel to the line 
x = xo is given by 


t 



1 r x ° 

V2g Jo 



dx. 


Because the integrand does not contain x explicitly, we can use (22.8) with the specific 
form F = \J 1 + y' 2 / ^Jy to find a first integral; on simplification this yields 


y(i +y' 2 ) 


1/2 


= k 9 
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where k is a constant. Letting a = k 2 and solving for y' we find 

, dy I a — y 

dx V y 

which on substituting y = a sin 2 6 integrates to give 

x = ^ (28 — sin 2d ) + c. 

Thus the parametric equations of the curve are given by 


x = b{(j> — sin <^>) + c, y = b(l — cost/)). 


where b = a/2 and <j> = 20; they define a cycloid, the curve traced out by a point on 
the rim of a wheel of radius b rolling along the x-axis. We must now use the end-point 
conditions to determine the constants b and c. Since the curve passes through the origin, 
we see immediately that c = 0. Now since y(xo) is arbitrary, i.e. the upper end-point can 
lie anywhere on the curve x = x 0 , the condition (22.20) reduces to (22.16), so that we also 
require 


8F 


dy' 


x=x 0 


y 

V.v(l + / 2 ) 


= 0 , 


X=Xq 


which implies that y' = 0 at x = x 0 . In words, that the tangent to the cycloid at B must 
be parallel to the x-axis; this requires nb = x 0 . ◄ 


22.4 Constrained variation 


Just as the problem of finding thestationary values of a function f(x,y ) subject to 
the constraint g(x,y) = constant is solved by means of Lagrange’s undetermined 
multipliers (see chapter 5), so the corresponding problem in the calculus of 
variations is solved by an analogous method. 

Suppose that we wish to find the stationary values of 

I = F(y, y , x) dx, 

J a 

subject to the constraint that the value of 

[b 

J= G(y,y',x)dx 


is held constant. Following the method of Lagrange undetermined multipliers let 
us define a new functional 


K = I + aJ — I (F + aG ) dx, 

J a 

and find its unconstrained stationary values. Repeating the analysis of section 22.1 
we find that we require 


8F 

_±\ 

(8F\ 

I + A 

dG 

d 

fdGY 

dy 

dx 

[sy) 

Jy 

dx 

\dy'J. 
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Figure 22.7 A uniform rope with fixed end-points suspended under gravity. 

which, together with the original constraint J = constant, will yield the required 
solution y(x). 

This method is easily generalised to cases with more than one constraint by the 
introduction of more Lagrange multipliers. If we wish to find the stationary values 
of an integral / subject to the multiple constraints that the values of the integrals 
Ji be held constant for i = 1,2, ...,n, then we simply find the unconstrained 
stationary values of the new integral 

n 

k=i + J2 Vi. 

1 


>-Find the shape assumed by a uniform rope when suspended by its ends from two points 
at equal heights. 


We will solve this problem using x (see figure 22.7) as the independent variable. Let 
the rope of length 2 L be suspended between the points x = +a, y = 0 (L > a) and 
have uniform linear density p. We then need to find the stationary value of the rope’s 
gravitational potential energy, 


I = -pg J yds = -pg J y(l + y' 2 ) 1/2 dx. 


with respect to small changes in the form of the rope but subject to the constraint that 
the total length of the rope remains constant, i.e. 


J = 



n + y' 2 ) 1/2 dx = 2L. 


We thus define a new integral (omitting the factor —1 from / for brevity) 


K = I + 2 J 



+ m+y' 2 ) 1/2 dx 


and find its stationary values. Since the integrand does not contain the independent 
variable x explicitly, we can use (22.8) to find the first integral: 


(pgy + 2) (l + y ,2 J ~(pgy + A)(l+y' 2 ') y‘ 


1/2 


= k, 
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where k is a constant; this reduces to 


pgy + X 


- 1 . 


Making the substitution pgy + X = k cosh z, this can be integrated easily to give 


^CO S h-'( Pgy + ; - } = 


Pg 


■ X + c. 


where c is the constant of integration. 

We now have three unknowns, X, k and c, that must be evaluated using the two end 
conditions y(+a) = 0 and the constraint J = 2 L. The end conditions give 


cosh 


pg( a + c) A pg(-a + c) 

-n^ = k =C0&h — k — ’ 


and since a =f= 0, these imply c = 0 and X/k = cosh (pga/k). Putting c = 0 into the 
constraint, in which y' = sinh(pgx/fe), we obtain 


mi 

pg v k ) 

Collecting together the values for the constants, the form adopted by the rope is therefore 



where k is the solution of sinh(pga//c) = pgL/k. This curve is known as a catenary. ◄ 


22.5 Physical variational principles 

Many results in both classical and quantum physics can be expressed as varia- 
tional principles, and it is often when expressed in this form that their physical 
meaning is most clearly understood. Moreover, once a physical phenomenon has 
been written as a variational principle, we can use all the results derived in this 
chapter to investigate its behaviour. It is usually possible to identify conserved 
quantities, or symmetries of the system of interest, that otherwise might be found 
only with considerable effort. From the wide range of physical variational princi- 
ples we will select two examples from familiar areas of classical physics, namely 
geometric optics and mechanics. 


22.5.1 Fermat’s principle in optics 

Fermat’s principle in geometrical optics states that a ray of light travelling in a 
region of variable refractive index follows a path such that the total optical path 
length (physical length x refractive index) is stationary. 
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Figure 22.8 Path of a light ray at the plane interface between media with 
refractive indices n\ and n 2 , where n 2 < ni. 


► From Fermat’s principle deduce Snell’s law of refraction at an interface. 


Let the interface be at y = constant (see figure 22.8) and let it separate two regions with 
refractive indices ni and n 2 respectively. On a ray the element of physical path length is 
ds = (1 + y' 2 ) 1/2 dx, and so for a ray that passes through the points A and B , the total 
optical path length is 

f b 

P = j n(y)(l +y' 2 ) 1/2 dx. 

Since the integrand does not contain the independent variable x explicitly, we use (22.8) 
to obtain a first integral, which, after some rearrangement, reads 

»(y) (i + y' 2 ) 1 =k, 

where k is a constant. Recalling that y' is the tangent of the angle <j> between the 
instantaneous direction of the ray and the x-axis, this general result, which is not dependent 
on the configuration presently under consideration, can be put in the form 

n cos f = constant 

along a ray, even though n and <j> vary individually. 

For our particular configuration n is constant in each medium and therefore so is 
Thus the rays travel in straight lines in each medium (as anticipated in figure 22.8, 
but not assumed in our analysis), and since k is constant along the whole path we have 
cos = n 2 coscj) 2 , or in terms of the conventional angles in the figure 

n l sindi = n 2 sinP 2 - < 


22.5.2 Hamilton’s principle in mechanics 

Consider a mechanical system whose configuration can be uniquely defined by a 
number of coordinates q,- (usually distances and angles) together with time f and 
which experiences only forces derivable from a potential. Hamilton’s principle 
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Figure 22.9 Transverse displacement on a taut string that is fixed at two 
points a distance / apart. 


states that in moving from one configuration at time to to another at time t\ the 
motion of such a system is such as to make 

£ = / L(qi,q 2 ...,q„,qi,q 2 ,...,q„,t)dt (22.21) 

Jt a 

stationary. The Lagrangian L is defined, in terms of the kinetic energy T and 
the potential energy V (with respect to some reference situation), by L = T — V. 
Here V is a function of the q t only, not of the q t . Applying the EL equation to £ 
we obtain Lagrange’s equations, 


8L d ( dL\ 

8qt dt \dqt J ’ 


i = 1 , 2 ,..., n. 


► Using Hamilton's principle derive the wave equation for small transverse oscillations of a 
taut string. 


In this example we are in fact considering a generalisation of (22.21) to a case involving 
one isolated independent coordinate f, together with a continuum in which the q t become 
the continuous variable x. The expressions for T and V therefore become integrals over x 
rather than sums over the label i. 

If p and t are the local density and tension of the string, both of which may depend on 
x, then, referring to figure 22.9, the kinetic and potential energies of the string are given 
by 


T = 


!(S> * 


v 


and (22.21) becomes 


T 

2 



dx 



dx. 
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Using (22.13) and the fact that y does not appear explicitly, we obtain 



If, in addition, p and t do not depend on x or f then 

Py = }_8^y_ 

8x 2 c 2 8t 2 ’ 

where c 2 = r/p. This is the wave equation for small transverse oscillations of a taut 
uniform string. ◄ 


22.6 General eigenvalue problems 


We have seen in this chapter that the problem of finding a curve that makes the 
value of a given integral stationary when the integral is taken along the curve 
results, in each case, in a differential equation for the curve. It is not a great 
extension to ask whether this may be used to solve differential equations, by 
setting up a suitable variational problem and then seeking ways other than the 
Euler equation of finding or estimating stationary solutions. 

We shall be concerned with differential equations of the form Cy — Xp(x)y, 
where the differential operator C is self-adjoint, so that C = (with appropriate 
boundary conditions on the solution y) and p(.x) is some weight function, as 
discussed in chapter 17. In particular, we will concentrate on the Sturm-Liouville 
equation as an explicit example, but much of what follows can be applied to 
other equations of this type. 

We have already discussed the solution of equations of the Sturm-Liouville 
type in chapter 17 and the same notation will be used here. In this section, 
however, we will adopt a variational approach to estimating the eigenvalues of 
such equations. 

Suppose we search for stationary values of the integral 


I = 


p(x)y ,2 (x) ■ 


- q(x)y (x) 


dx. 


( 22 . 22 ) 


with y(a) = y(b) = 0 and p and q any sufficiently smooth and differentiable 
functions of x. However, in addition we impose a normalisation condition 

r h 

J= / p(x)y 2 (x) dx = constant. (22.23) 

J a 

Here p(.x) is a positive weight function defined in the interval a < x < b, but 
which may in particular cases be a constant. 

Then, as in section 22.4, we use undetermined Lagrange multipliers,! and 


f We use —X, rather than X, so that the final equation (22.24) appears in the conventional Sturm- 
Liouville form. 
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consider K = I — /.J given by 


K = 



(q + Ap)y 2 


dx. 


On application of the EL equation (22.5) this yields 

s( p s) +w+a «’ _0 - (2124) 

which is exactly the Sturm-Liouville equation (17.35), with eigenvalue X. Now, 
since both / and J are quadratic in y and its derivative, Ending stationary values 
of K is equivalent to finding stationary values of I / J. This may also be shown 
by considering the functional A = I / J, for which 

SA = (SI /J) — (I /J 2 )SJ 
= (51 - A5J)/J 
= 5K/J. 


Hence, extremising A is equivalent to extremising K. Thus we have the important 
result that finding functions y that make I /J stationary is equivalent to finding 
functions y that are solutions of the Sturm-Liouville equation; the resulting value 
ofl/J equals the corresponding eigenvalue of the equation. 

Of course this does not tell us how to find such a function y and, naturally, to 
have to do this by solving (22.24) directly defeats the purpose of the exercise. We 
will see in the next section how some progress can be made. It is worth recalling 
that the functions p(x), q(x) and p(x) can have many different forms, and so 
(22.24) represents quite a wide variety of equations. 

We now recall some properties of the solutions of the Sturm-Liouville equation. 
The eigenvalues 2,- of (22.24) are real and will be assumed non-degenerate (for 
simplicity). We also assume that the corresponding eigenfunctions have been made 
real, so that normalised eigenfunctions y^x) satisfy the orthogonality relation (as 
in (17.27)) 


/ yAjP dx = 5ij. 
J a 


Further, we take the boundary condition in the form 


(22.25) 


yipy. 


x=b 


= 0 ; 


(22.26) 


this can be satisfied by y(a) = y(b) = 0, but also by many other sets of boundary 
conditions. 
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► Show that 




[ ( y'jpy'i - yffli) dx = XAj. 

J a 

(22.27) 


Let }’i be an eigenfunction of (22.24), corresponding to a particular eigenvalue A,-, so that 

{pyl)' + (q + Xip)yi = 0. 

Multiplying this through by yj and integrating from a to b (the first term by parts) we 
obtain 



y'j(py'i) dx + 


/ yfiq + X t p)y, dx = 0. 
J a 


(22.28) 


The first term vanishes by virtue of (22.26), and on rearranging the other terms and using 
(22.25), we find the result (22.27). ◄ 


We see at once that, if the function y(x) minimises I /J, i.e. satisfies the Sturm- 
Liouville equation, then putting y t = yj = y in (22.25) and (22.27) yields J and 
/ respectively on the left-hand sides; thus, as mentioned above, the minimised 
value of I /J is just the eigenvalue A, introduced originally as the undetermined 
multiplier. 


► For a function y satisfying the Sturm— Liouville equation verify that, provided ( 22.26 ) is 
satisfied, A = I /J. 

Firstly, we multiply (22.24) through by y to give 

y(py')' + qy 2 + Xpy 2 = 0. 

Now integrating this expression by parts we have 

t rb ^ pb 

~ ( py' ~ qy 1 ) dx + x / py 2 dx = o. 

a J a. ' J a 

The first term on the LEIS is zero, the second is simply — / and the third is XJ. Thus 
X = I/J. ◄ 


ypy' 


22.7 Estimation of eigenvalues and eigenfunctions 

Since the eigenvalues A,- of the Sturm-Liouville equation are the stationary values 
of 7/J (see above), it follows that any evaluation of I /J must yield a value that lies 
between the lowest and highest eigenvalues of the corresponding Sturm-Liouville 
equation, i.e. 

/ 

Z-mm A ^ L A m ax, 

where, depending on the equation under consideration, either A m i n = — co and 
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Imax is finite, or A max = oo and X mm is finite. Notice that here we have departed 
from direct consideration of the minimising problem and made a statement about 
a calculation in which no actual minimisation is necessary. 

Thus, as an example, for an equation with a finite lowest eigenvalue Ao any 
evaluation of I /J provides an upper bound on Ao- Further, we will now show that 
the estimate A obtained is a better estimate of Ao than the estimated (guessed) 
function y is of yo, the true eigenfunction corresponding to Ao. The sense in which 
‘better’ is used here will be clear from the final result. 

Firstly, we expand the estimated or trial function y in terms of the complete 
set y,- : 


y = yo + ciyi + c 2 y 2 H — , 


where, if a good trial function has been guessed, the c f will be small. Using (22.25) 
we have immediately that J = \ + |c,l 2 - The other required integral is 


/ = 




dx. 


On multiplying out the squared terms, all the cross terms vanish because of 
(22.27) to leave 


I _ X o + \cj\~Xj 
J 1 + Ej N 2 
= Xq + \ci\ 2 (Xi — Ao) + 0(c 4 ). 


Hence A differs from Ao by a term second order in the c h even though y differed 
from yo by a term first order in the c,-; this is what we aimed to show. We notice 
incidentally that, since Ao < A,- for all i, X is shown to be necessarily > Ao, with 
equality only if all a = 0, i.e. if y = yo. 

The method can be extended to the second and higher eigenvalues by imposing, 
in addition to the original constraints and boundary conditions, a restriction 
of the trial functions to only those that are orthogonal to the eigenfunctions 
corresponding to lower eigenvalues. (Of course, this requires complete or nearly 
complete knowledge of these latter eigenfunctions.) An example is given at the 
end of the chapter (exercise 22.26). 

We now illustrate the method we have discussed by considering a simple 
example, one for which, as on previous occasions, the answer is obvious. 
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y(x) 



Figure 22.10 Trial solutions used to estimate the lowest eigenvalue X of 
—y" = Xy with y(0) = y'(l) = 0. They are: (a) y = sin(7tx/2), the exact result; 
(i>) y = 2x — x 2 ; (c) y = x 3 — 3x 2 + 3x; (d) y = sin 2 (7ix/2). 


► Estimate the lowest eigenvalue of the equation 



d 2 y 

~d^=^ 

0 < x < 1, 

(22.29) 

with boundary conditions 





3 

II 

o 

/(1) = 0. 

(22.30) 


We need to find the lowest value Xq of X for which (22.29) has a solution y(x) that satisfies 
(22.30). The exact answer is of course y = Asin(.x7t/2) and X 0 = n 1 /A « 2.47. 

Firstly we note that the Sturm-Liouville equation reduces to (22.29) if we take p(x) = 1, 
q(x) = 0 and p(x) = 1 and that the boundary conditions satisfy (22.26). Thus we are able 
to apply the previous theory. 

We will use three trial functions so that the effect on the estimate of Xo of making better 
or worse ‘guesses' can be seen. One further preliminary remark is relevant, namely that the 
estimate is independent of any constant multiplicative factor in the function used. This 
is easily verified by looking at the form of I /J. We normalise each trial function so that 
v(l) = 1, purely in order to facilitate comparison of the various function shapes. 

Figure 22.10 illustrates the trial functions used, curve (a) being the exact solution 
y = sin(7ix/2). The other curves are ( b ) y(x) = 2x — x 2 , (c) y(x) = .x 3 — 3x 2 + 3.x, and (d) 
y(x) = sin 2 (7ix/2). The choice of trial function is governed by the following considerations: 

(i) the boundary conditions (22.30) must be satisfied. 

(ii) a ‘good’ trial function ought to mimic the correct solution as far as possible, but 
it may not be easy to guess even the general shape of the correct solution in some 
cases. 

(iii) the evaluation of I /J should be as simple as possible. 
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It is easily verified that functions (£>), (c) and ( d ) all satisfy (22.30) but, so far as mimicking 
the correct solution is concerned, we would expect from the figure that (b) would be 
superior to the other two. All three evaluations are straightforward, using (22.22) and 
(22.23): 


h = 
K = 
h\ = 


Jo (2 ~ 2xf dx = 4/3_ = ? 50 
Jq(2x — x 2 ) 2 dx 8/15 

fg ( 3x 2 — 6x + 3 ) 2 dx 9/5 
fg(x 3 — 3.x 2 + 3x) 2 dx 9/14 
fg (jc 2 /4) sin 2 (nx)dx n 2 /S 

fo sin 4 (nx/2)dx 3/8 


We expected all evaluations to yield estimates greater than the lowest eigenvalue, 2.47, 
and this is indeed so. From these trials alone we are able to say (only) that X 0 < 2.50. 
As expected, the best approximation (b) to the true eigenfunction yields the lowest, and 
therefore the best, upper bound on Xg. ◄ 


We may generalise the work of this section to other differential equations of 
the form Cy = Xpy, where C = C) . In particular, one finds 


'‘-min — j — ^max? 


where I and J are now given by 


I = f y*(Cy)dx 


and 


J = 


py’ydx. 


(22.31) 


It is straightforward to show that, for the special case of the Sturm-Liouville 
equation, for which 

£y = -ipy')' - qy. 


the expression for I in (22.31) leads to (22.22). 


22.8 Adjustment of parameters 

Instead of trying to estimate 2o by selecting a large number of different trial 
functions, we may also use trial functions that include one or more parameters 
which themselves may be adjusted to give the lowest value to X = I /J and 
hence the best estimate of Xq. The justification for this method comes from the 
knowledge that no matter what form of function is chosen, nor what values are 
assigned to the parameters, provided the boundary conditions are satisfied X can 
never be less than the required Xq. 

To illustrate this method an example from quantum mechanics will be used. 
The time-independent Schrodinger equation is formally written as the eigenvalue 
equation Hip = Exp, where H is a linear operator, xp the wavefunction describing 
a quantum mechanical system and E the energy of the system. The energy 
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22.8 ADJUSTMENT OF PARAMETERS 


operator H is called the Hamiltonian and for a particle of mass m moving in a 
one-dimensional harmonic oscillator potential is given by 


fr d 2 kx 2 
2m dx 2 2 


(22.32) 


where h is Planck's constant divided by 2n. 


► Estimate the ground-state energy of a quantum harmonic oscillator. 


Using (22.32) in Hip 


= Exp, the Schrodinger equation is 
h 2 drxp kx 2 

— - rw H — = Exp, —oo < x < co. 

2m dx 1 2 


(22.33) 


The boundary conditions are that xp> should vanish as x — ► +oo. Equation (22.33) is a form 
of the Sturm-Liouville equation in which p = h 2 /(2m), q = —kx 2 / 2, p = 1 and X = E; it 
can be solved by the methods developed previously, e.g. by writing the eigenfunction xp as 
a power series in x. 

However, our purpose here is to illustrate variational methods and so we take as a trial 
wavefunction xp = exp(— ax 2 ), where a is a positive parameter whose value we will choose 
later. This function certainly — > 0 as x — > +oo and is convenient for calculations. Whether 
it approximates the true wave function is unknown, but if it does not our estimate will 
still be valid, although the upper bound will be a poor one. 

With y = exp(— ctx 2 ) and therefore y' = — 2axexp(— ax 2 ), the required estimate is 


f/° oo [(h 2 /2m)4a 2 x 2 + (k/2)x 2 ]e 2 “* 2 dx _ h 2 a k 

f 00 £- 2 ax 2 2m 8a 

J— 00 


(22.34) 


This evaluation is easily carried out using the reduction formula 


In = 


n — 1 
4a 


2 , for integrals of the form I„ = / x"e 


/: 


1-J.X 2 


dx. 


(22.35) 


So, we have obtained the estimate (22.34), involving the parameter a, for the oscillators 
ground-state energy, i.e. the lowest eigenvalue of H. In line with our previous discussion 
we now minimise X with respect to a. Putting dX/da = 0 (clearly a minimum), yields 
a = (km) 1/2 /(2h), which in turn gives as the minimum value for X 


h f k\ ^ 2 fico 
2 \m) = ^2’ 


(22.36) 


where we have put (k/m) l/2 equal to the classical angular frequency co. 

The method thus leads to the conclusion that the ground-state energy Eo is < \hco. 
In fact, as is well known, the equality sign holds, \hco being just the zero-point energy 
of a quantum mechanical oscillator. Our estimate gives the exact value because xp(x) = 
exp(— ax 2 ) is the correct functional form for the ground state wavefunction and the 
particular value of a that we have found is the that needed to make xp an eigenfunction of 
H with eigenvalue < \hco. ◄ 


An alternative but equivalent approach to this is developed in the exercises 
that follow, as is an extension of this particular problem to estimating the second- 
lowest eigenvalue (see exercise 22.26). 
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CALCULUS OF VARIATIONS 


22.1 


22.2 


22.3 


22.4 


22.5 


22.6 


22.7 


22.8 


22.9 Exercises 


A surface of revolution, whose equation in cylindrical polar coordinates is p = 
p(z), is bounded by the circles p = a, z = +c (a > c). Show that the function 
that makes the surface integral / = f p~ 1/2 dS stationary with respect to small 
variations is given by p(z) = k + z 2 /(4fc), where k = [a + (a 2 — c 2 ) 1/2 ]/2. 

Show that the lowest value of the integral 

r B (i + y , 2 y/ 2 , 

/ dx, 

Ja y 

where A is (—1, 1) and B is (1, 1), is 2 ln(l + yjl). Assume that the Euler-Lagrange 
equation gives a minimising curve. 

The refractive index n of a medium is a function only of the distance r from a 
fixed point O. Prove that the equation of a light ray, assumed to lie in a plane 
through 0, travelling in the medium satisfies (in plane polar coordinates) 

1 f dr r 2 t r(r) 

r 2 \d4> ) a 2 n 2 (a) 


where a is the distance of the ray from O at the point at which dr/d<j> = 0. 

If n = [1 + (a 2 /r 2 )] 1/2 and the ray starts and ends far from 0, find its deviation 
(the angle through which the ray is turned) if its minimum distance from O is a. 
The Lagrangian for a 7i-meson is given by 

L(x,r) = f(( i> 2 - \V(f>\ 2 — p 2 4> 2 ), 

where p is the meson mass and t) is its wavefunction. Assuming Hamilton's 
principle find the wave equation satisfied by (f>. 

(a) For a system described in terms of coordinates q t and t, show that if t does 
not appear explicitly in the expressions for x, y and z (x = x(q h t), etc.) then 
the kinetic energy T is a homogeneous quadratic function of the q t (it may 
also involve the q t ). Deduce that J2idi(dT /dqi) = 2 T. 

(b) Assuming that the forces acting on the system are derivable from a potential 
V, show, by expressing dT /dr in terms of q t and q u that d(T + V)/dt = 0. 

For a system specified by the coordinates q and f, show that the equation of 
motion is unchanged if the Lagrangian L(q,q,t) is replaced by 


U 


L | d<j)(q,t) 
dt 


where <j> is an arbitrary function. Deduce that the equation of motion of a particle 
that moves in one dimension subject to a force — dV(x)/dx (x being measured 
from a point O) is unchanged if O is forced to move with a constant velocity v 
(x still being measured from 0 ). 

In cylindrical polar coordinates, the curve (p(d),d,ap(d)) lies on the surface of 
the cone z = ap. Show that geodesics (curves of minimum length joining two 
points) on the cone satisfy 

p* = c 2 [fS 2 p' 2 + p 2 ], 


where c is an arbitrary constant, but /I has to have a particular value. Determine 
the form of p(d) and hence find the equation of the shortest path on the 
cone between the points (R,—9 0 ,uR) and (R,9 0 ,aR). (You will find it useful to 
determine the form of the derivative of cos~ 1 (w — 1 ).) 

Derive the differential equations for the polar coordinates r, 9 of a particle of 
unit mass moving in a field of potential V(r). Find the form of V if the path of 
the particle is given by r = a sin 9. 
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22.9 EXERCISES 


22.9 


22.10 


22.11 


22.12 


22.13 


22.14 


22.15 


You are provided with a line of length na/2 and negligible mass and some lead 
shot of total mass M. Use a variational method to determine how the lead shot 
must be distributed along the line if the loaded line is to hang in a circular arc of 
radius a when its ends are attached to two points at the same height. (Measure 
the distance s along the line from its centre.) 

Extend the result of subsection 22.2.2 to the case of several dependent variables 
yt(x), showing that, if x does not appear explicitly in the integrand, then a first 
integral of the Euler-Lagrange equations is 


F-^y 


,SF 


= constant. 


A general result is that light travels through a variable medium by a path which 
minimises the travel time (this is an alternative formulation of Fermat’s principle). 
With respect to a particular cylindrical polar coordinate system (p, <j>, z) the speed 
of light v(p,(j>) is independent of z. If the path of the light is parameterised as 
p = p(z),<j> = r/>(z), use the result of the previous exercise to show that 

v 2 (p' 2 + p 2 (j>' 2 + 1) 

is constant along the path. 

For the particular case when v = v(p) = b{a 2 + p 2 ) 1/2 , show that the two Euler- 
Lagrange equations have a common solution in which the light travels along a 
helical path given by <f> = Az + B, p = C, provided that A has a particular value. 
Light travels in a vertical xz-plane through a slab of material which lies between 
the planes z = zo and z = 2zo and in which the speed of light v(z) = cqz/zo. Using 
the alternative formulation of Fermat’s principle given in the previous question, 
show that the ray paths are arcs of circles. 

Deduce that, if a ray enters the material at (0, z 0 ) at an angle to the vertical, 
n/2 — 9, of more than 30°, it does not reach the far side of the slab. 

A dam of capacity V (less than nb 2 h/2) is to be constructed on level ground next 
to a long straight wall which runs from (—b, 0) to (6, 0). This is to be achieved by 
joining the ends of a new wall, of height h, to those of the existing wall. Show 
that, in order to minimise the length L of new wall to be built, it should form 
part of a circle, and that L is then given by 

f b dx 

h (1 -AW 2 ’ 

where A is found from 

V _ sin _ V _ (1 -p 2 ) 1/2 
hb 2 p 2 p 

and p = 2b. 

The Schwarzchild metric for the static field of a non-rotating spherically sym- 
metric black hole of mass M is 

^ = c2 ( 1 ~ 2 ^) - 1 — 2GM /(c 2 r) - ^ {d6)2 - r " Sin2 9 m2 ' 

Considering only motion confined to the plane 9 = n/2, and assuming that the 
path of a small test particle is such as to make f ds stationary, find two first 
integrals of the equations of motion. From their Newtonian limits, in which 
GM/r, r 2 and r 2 cj> 2 are all <C c 2 , identify the constants of integration. 

In the brachistochrone problem of subsection 22.3.4 show that if the upper end- 
point can lie anywhere on the curve h(x, y) = 0 then the curve of quickest descent 
y(x) meets h(x,y ) = 0 at right angles. 
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CALCULUS OF VARIATIONS 


22.16 


22.17 


22.18 

22.19 


22.20 


22.21 


22.22 


22.23 


Use result (22.27) to evaluate 


J = j (1 - x 2 )P',(x)P!,(x)dx, 

where P m (x ) is a Legendre polynomial of order m. 

Determine the minimum value that the integral 

J= [ [x 4 (y") 2 + 4x 2 (y') 2 ] dx, 

Jo 

can have, given that y is not singular at x = 0 and that y(l) = y'(l) = 1. 
Assume that the Euler-Lagrange equation does give the lower limit, and verify 
retrospectively that your solution makes the first term on the LHS of equation 
(22.15) vanish. 

Show that y" — xy + lx 2 y = 0 has a solution for which y(0) = y(l) = 0 and 
X < 147/4. ' 

Find an appropriate but simple trial function and use it to estimate the lowest 
eigenvalue Jo of Stokes’ equation 

d 2 v 

—L + Xxy = 0, y(0) = y(n) = 0. 

Explain why your estimate must be strictly greater than Xq. 

Estimate the lowest eigenvalue X 0 of the equation 

x 2 y + Jy = 0 , y( 1) = y(l) = 0, 


using a quadratic trial function. 

A drumskin is stretched across a fixed circular rim of radius a. Small transverse 
vibrations of the skin have an amplitude z{p,J>,t) that satisfies 

2 1 d 2 z 

V 2 z = 

“ c 2 8t 2 

in plane polar coordinates. For a normal mode independent of azimuth, z = 
Z(p) cos cot, find the differential equation satisfied by Z(p). By using a trial 
function of the form a v — p v , obtain an estimate for the lowest normal mode 
frequency. (The exact answer is (5.78 ) 1/2 c/a.) 

(a) Recast the problem of finding the lowest eigenvalue X 0 of the equation 

(1 + x 2 )^ + 2x^- + Xy = 0, y(+l) = 0, 

dx 1 dx 

in variational form, and derive an approximation X\ to Xo by using the trial 
function yi(x) = 1 — x 2 . 

(b) Show that an improved estimate U is obtained by using y 2 (x) = cos(7ix/2). 

(c) Prove that the estimate X(y) obtained by taking yi(x) + yyi^x) as the trial 
function is 

_ 64/15 + 16y/7t + (ti 2 /3 + l/2)y 2 
16/15 + 64y / 7r 3 + y 2 

Investigate X(y) numerically as y is varied, or, more simply, show that 
1(1) = 3.183, a significant improvement on both X t and 2,2. 

For the boundary conditions given below, obtain a functional A(y) whose sta- 
tionary values give the eigenvalues of the equation 


(1 + x )~j ^2 + + x )~r + Xy — 0, 

dx z dx 


y(0) = 0, y'( 2) = 0. 
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22.24 


22.25 


Derive an approximation to the lowest eigenvalue 2o using the trial function 
y(x) = xe _x/2 . For what value(s) of y would 

y(x) = xe~ x/2 + /? sin yx 


be a suitable trial function for attempting to obtain an improved estimate of 2 0 ? 
The upper and lower surfaces of a film of liquid with surface energy per unit 
area (surface tension) equal to y and with density p have equations z = p(x) and 
z = q(x) respectively. The film has a given volume V (per unit depth in the in- 
direction) and lies in the region —L < x < L, with p(0) = q( 0) = p(L) = q(L) = 0. 
The total energy (per unit depth) of the film consists of its surface energy and its 
gravitational energy, and is expressed by 

E = \ J L (p 2 -q 2 )dx + y j L [(l+p' 2 ) 1/2 + (l+q ,2 ) 1/2 ] dx. 


(a) Express V in terms of p and q. 

(b) Show that, if the total energy is minimised, p and q must satisfy 


P 


a 


(1+ P ' 2 ) 1 / 2 


<7 


a 


(1 + q' 2 ) 1 ' 2 


= constant. 


(c) As an approximate solution, consider the equations 


p = a(L — \x\), q = b{L — \x\), 

where a and b are sufficiently small that a 3 and b 3 can be neglected compared 
to unity. Find the values of a and b that minimise E. 


This is an alternative approach to the example in section 22.8. Using the notation 
of that section, the expectation value of the energy of the state ip is given by 
f ip’Hy dv. Denote the eigenfunctions of H by i p t , so that Hip, = Ej\pi, and, since 
H is self-adjoint (Elermitian), f ip” ip, dv = d,j. 


(a) By writing any function ip as JZ Cjipj and following an argument similar to 
that in section 22.7, show that 


_ / ip’Hip dv 

f ip’xpdv ~ °’ 


the energy of the lowest state. (This is the Rayleigh-Ritz principle.) 

(b) Using the same trial function as in section 22.8, ip = exp(— ax 2 ), show that 
the same result is obtained. 


22.26 This is an extension to section 22.8 and the previous question. With the ground- 
state (i.e. the lowest-energy) wavefunction as exp(— ax 2 ), take as a trial function 
the orthogonal wave function x 2n+1 exp(— ax 2 ), using the integer n as a variable 
parameter. Use either Sturm-Liouville theory or the Rayleigh-Ritz principle to 
show that the energy of the second lowest state of a quantum harmonic oscillator 
is < 3ho)/2. 

22.27 The Hamiltonian H for the hydrogen atom is 


2m 4neor 

For a spherically symmetric state, as may be assumed for the ground state, the 
only relevant part of V 2 is that involving differentiation with respect to r. 

(a) Define the integrals J„ by 

/»00 

J n = / r n e- 2pr dr 

Jo 
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22.28 


22.29 


and show that, for a trial wavefunction of the form exp ( — /Sr) with /J > 0, 
J \p'H\p dv and f ip’ip dv (see exercise 22.25(a)) can be expressed as aJi — bJ 2 
and cj 2 respectively, where a, b, c are factors which you should determine. 

(b) Show that the estimate of E is minimised when /l = mq 2 /(4neoh 2 ). 

(c) Hence find an upper limit for the ground-state energy of the hydrogen atom. 
In fact, exp (—fir) is the correct form for the wavefunction and the limit gives 
the actual value. 


A particle of mass m moves in a one-dimensional potential well of the form 

T/ , \ h2( * 2 u 2 

V(x) = —fi seen ax, 

m 


where p and a are positive constants. As in exercise 22.27, the expectation value 
(£) of the energy of the system is f xp'Hxpdx, where the self-adjoint operator 
H is given by —(h 2 /2m)d 2 /dx 2 + V(x). Using trial wavefunctions of the form 
y = A sech fix, show the following: 


(a) for /( = 1 there is an exact eigenfunction of H , with a corresponding (E) of 
half of the maximum depth of the well; 

(b) for p = 6 the ‘binding energy' of the ground state is at least 10fi 2 a 2 /(3;w). 


(You will find it useful to note that for u, v > 0, sech u sech v > sech (u + v).) 
The Sturm-Liouville equation can be extended to two independent variables, x 
and z, with little modification. In equation (22.22) y' 2 is replaced by (Vy) 2 and 
the integrals of the various functions of y(x,z) become two-dimensional, i.e. the 
infinitesimal is dx dz. 

The vibrations of a trampoline 4 units long and 1 unit wide satisfy the equation 

V 2 y + k 2 y = 0. 


By taking the simplest possible permissible polynomial as a trial function, show 
that the lowest mode of vibration has k 2 < 10.63 and, by direct solution, that the 
actual value is 10.49. 


22.2 

22.3 


22.4 

22.5 


22.6 

22.7 

22.8 
22.9 


22.10 Hints and answers 


The minimising curve is x 2 + y 2 = 2. 

/ = f n(r)[r 2 + (dr / defr) 2 ] 1 12 dtj>. Take axes such that <j> = 0 when r = oo. 

If P = (n — deviation angle)/2 then /? = cfi at r = a, and the equation reduces to 

/? r° dr 

( a 2 + oc 2 1 1 / 2 /.a, r(r 2 — a 2 ) 1 / 2 ’ 

which can be evaluated by putting r = a(y + y~ 1 )/ 2, or successively r = a cosh ig, 
v = exp w to yield a deviation n\(a 2 + or ) 1/2 — a]/ a. 

V 2 ct> - 8 2 4>/8t 2 = n 2 <(>. 

(a) 8x/8t = 0 and so x = JA q^x/Squ (b) use 




< p(x , t) = m(vx + v 2 t/ 2 ). 

Use result (22.8); jd 2 = 1 + or. Put p = uc to obtain dd/du = fi/[u(u 1 — 1 ) 1/2 ]. 
Remember that cos -1 is a multivalued function; p(9) = [Rcos(0 o //8)]/[cos(0//))]. 
r 2 6 = k, r — rid 2 + d V /dr = 0 , V (r) = —k 2 a 2 / (2r 4 ) + constant. 

—ky'( 1 — y ,2 ) _1/2 = 2g P(s), y = y(s), P(s) = f 0 'p(s')ds'. The solution y = 
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—acos(s/a) and 2P(na/4) = M give A = —gM. The required p(s) is 
[M/(2a)\ sec 2 (s/a). 

22.10 Note that clF/dx contains partial contributions from all y,(x) and all y\(x) but 
no 8F /8x term. 

22.11 A = 1/a. 

22.12 Circle is (x — z 0 tan 9) 2 + z 2 = zj sec 2 9. Consider the value of z when dz/dx = 0. 

22.13 Circle is X 2 x 2 + [Ay + (1 — X 2 b 2 ) 1/2 ] 2 = 1. Use the fact that f y dx = V/h to 
determine the condition on X. 

22.14 Denoting ( ds) 2 /(dt ) 2 by f 2 , the Euler-Lagrange equation for </> gives r 2 cf> = Af 
where A corresponds to the angular momentum of the particle. Use the result 
of exercise 22.10 to obtain c 2 — (2 GM/r) = Bf , where to first order in small 
quantities 

. GM 1 1 T • T 

cB = c 2 1- -{r 2 + r~(j) 2 ), 

r 1 

which reads ‘total energy = rest mass + gravitational energy + radial and 
azimuthal kinetic energy’. 

22.16 Note that Legendre’s equation is a Sturm-Liouville equation with p(+ 1) = 0 
and p(x) = 1. For normalised eigenfunctions take y„,(x) = [(2m + l)/2] 1/2 P,„(.x); 
J = { [2 m(m + l)]/(2m + 1)}5,„„. 

22.17 Convert the equation to the usual form, by writing y'(x) = u(x), and obtain 

x 2 u" + 4 xu! — 4it = 0 with general solution Ax~ 4 + Bx. Integrating a second time 

and using the boundary conditions gives y(x) = (1 + x 2 )/2 and J = 1; »;(1) = 0, 
since y'(l) is fixed, and dF/du' = 2x 4 i / = 0 at x = 0. 

22.18 The equation is of SL form with p = 1, q = — x and weight function x 2 . Try 
y = x(l — x). The integrals have values 7/20 and 1/105. 

22.19 Using y = sinx as a trial function shows that X 0 < 2/ti. The estimate must be 
> Xo since the trial function does not satisfy the original equation. 

22.20 Using y = 1 — x 2 as a trial function shows that X 0 < 37/14. 

22.21 Z" + p _1 Z' + (m/c) 2 Z = 0, with Z(a) = 0 and Z'(0) = 0, an SL equation with 

p = p, q = 0 and weight function p/c 2 . Estimate of co 2 = [c 2 v/(2a 2 )] [0.5 — 2(v + 
2R 1 + (2v + 2)~ 1 ]~ 1 , which minimises to c 2 (2 + ^/2) 2 /(2a 2 ) = 5.83 c 2 /a 2 when 
V = y/2. 

22.22 (a) Follow the method of section 22.7 with p = 1 + x 2 , q = 0 and p = 1 ; Xi = 4. 

(b) X 2 = n 2 /3 + 1/2 « 3.79. 

(c) X(y) has a minimum value 3.1768 at y = 1.1976. 

22.23 Note that the original equation is not self-adjoint; it needs an integrating factor 
of e x . A(y) = [f 2 ( 1 + x)e x y' 2 dx]/[f 2 e x y 2 dx; Xo < 3/8. Since y'( 2) must equal 0, 
y = (n/2)(n + /) for some integer n. 

22.24 (a) V = f f ; (p — q)dx. (c) Use V = (a — b)L 2 to eliminate b from the expression 
for E ; now the minimisation is with respect to a alone. The values for a and b 
are ±L/(2L 2 ) - Vpg/(6y). 

22.25 The estimate is h 2 a/(2m) + k/(&(x) and the minimum occurs at the value of n that 
makes the two terms equal. 

22.26 E 1 < (ftoj/2)(8f? 2 + 12n + 3)/(4n + 1), which has a minimum value 3fi<u/2 when 
integer n = 0. 

22.27 (a) a = 4nh 2 fi/m — q 2 /e o, b = 2nh 2 fS 2 /m, c = 4n; (c) —mq 4 /[2(4neoh) 2 ]. 

22.28 (a) Fly = Xy requires that /? = a. 

( b ) f \p* [— (h 2 /2m)d 2 /dx 2 ]xp dx = h 2 p 2 /(6m) and f \p’Vxp dx < —6h 2 oL 2 /l/[m(ci.+ 
(l )] . The sum of the two integrals is minimised when /? = 2a, leading to the 
stated upper limit for (E). 

22.29 The SL equation has p = 1, q = 0, and p = 1. 

Use u(x,y) = x(4 — x)y( 1 — y) as a trial function. 

Numerator = 1088/90, denominator = 512/450. Direct solution k 2 = Iln 2 /16. 
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23 


Integral equations 


It is not unusual in the analysis of a physical system to encounter an equation 
in which an unknown but required function y{x), say, appears under an integral 
sign. Such an equation is called an integral equation, and in this chapter we discuss 
several methods for solving the more straightforward examples of such equations. 

Before embarking on our discussion of methods for solving various integral 
equations, we begin with a warning that many of the integral equations met in 
practice cannot be solved by the elementary methods presented here but must 
instead be solved numerically, usually on a computer. Nevertheless, the regular 
occurrence of several simple types of integral equation that may be solved 
analytically is sufficient reason to explore these equations more fully. 

We shall begin this chapter by discussing how a differential equation can be 
transformed into an integral equation and by considering the most common 
types of linear integral equation. After introducing the operator notation and 
considering the existence of solutions for various types of equation, we go on 
to discuss elementary methods of obtaining closed-form solutions of simple 
integral equations. We then consider the solution of integral equations in terms of 
infinite series and conclude by discussing the properties of integral equations with 
Hermitian kernels, i.e. those in which the integrands have particular symmetry 
properties. 


23.1 Obtaining an integral equation from a differential equation 

Integral equations occur in many situations, partly because we may always rewrite 
a differential equation as an integral equation. It is sometimes advantageous to 
make this transformation, since questions concerning the existence of a solu- 
tion are more easily answered for integral equations (see section 23.3), and, 
furthermore, an integral equation can incorporate automatically any boundary 
conditions on the solution. 
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23.2 TYPES OF INTEGRAL EQUATION 


We shall illustrate the principles involved by considering the differential equa- 
tion 


y"(x)= f{x,y), (23.1) 

where f(x,y ) can be any function of x and y but not of y'(x). Equation (23.1) 
thus represents a large class of linear and non-linear second-order differential 
equations. 

We can convert (23.1) into the corresponding integral equation by first inte- 
grating with respect to x to obtain 

y'(x) = / f(z,y(z))dz + a. 

Jo 

Integrating once more, we find 

r r 

y(x) = / du / f(z,y(z))dz + cix + C2. 

Jo Jo 

Provided we do not change the region in the wz-plane over which the double 
integral is taken, we can reverse the order of the two integrations. Changing the 
integration limits appropriately, we find 

f(z,y(z))dzj~ du + Ci.x + C 2 (23.2) 

(x — z)f(z,y(z))dz + cix + c 2 ; (23.3) 

this is a non-linear (for general f(x,y)) Volterra integral equation. 

It is straightforward to incorporate any boundary conditions on the solution 
y(x) by fixing the constants c\ and C 2 in (23.3). For example, we might have the 
one-point boundary condition y(0) = a and y'(0) = b, for which it is clear that 
we must set c\ = b and ci = a. 



23.2 Types of integral equation 

From (23.3), we can see that even a relatively simple differential equation such 
as (23.1) can lead to a corresponding integral equation that is non-linear. In this 
chapter, however, we will restrict our attention to linear integral equations, which 
have the general form 

g(x)y(x) = f(x) + X f K(x, z)y(z) dz . (23.4) 

J a 

In (23.4), y(x) is the unknown function, while the functions /(x), g(x) and K(x,z ) 
are assumed known. K(x,z) is called the kernel of the integral equation. The 
integration limits a and b are also assumed known, and may be constants or 
functions of x, and X is a known constant or parameter. 
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INTEGRAL EQUATIONS 


In fact, we shall be concerned with various special cases of (23.4), which are 
known by particular names. Firstly, if g(x) = 0 then the unknown function y(x) 
appears only under the integral sign, and (23.4) is called a linear integral equation 
of the first kind. Alternatively, if g(x) = 1, so that y(x) appears twice, once inside 
the integral and once outside, then (23.4) is called a linear integral equation of 
the second kind. In either case, if f(x ) = 0 the equation is called homogeneous, 
otherwise inhomogeneous. 

We can distinguish further between different types of integral equation by the 
form of the integration limits a and b. If these limits are fixed constants then the 
equation is called a Fredholm equation. If, however, the upper limit b — x (i.e. it 
is variable) then the equation is called a Volterra equation; such an equation is 
analogous to one with fixed limits but for which the kernel K(x,z) = 0 for z > x. 
Finally, we note that any equation for which either (or both) of the integration 
limits is infinite, or for which K(x,z ) becomes infinite in the range of integration, 
is called a singular integral equation. 


23.3 Operator notation and the existence of solutions 

There is a close correspondence between linear integral equations and the matrix 
equations discussed in chapter 8. Flowever, the former involve linear, integral rela- 
tions between functions in an infinite-dimensional function space (see chapter 17), 
whereas the latter specify linear relations among vectors in a finite-dimensional 
vector space. 

Since we are restricting our attention to linear integral equations, it will be 
convenient to introduce the linear integral operator /C, whose action on an 
arbitrary function y is given by 

ICy = f K{x,z)y{z)dz. (23.5) 

J a 

This is analogous to the introduction in chapters 16 and 17 of the notation C to 
describe a linear differential operator. Furthermore, we may define the Hermitian 
conjugate lO by 

r h 

Kfy— / K*(z,x)y(z)dz, 

J a 

where the asterisk denotes complex conjugation and we have reversed the order 
of the arguments in the kernel. 

It is clear from (23.5) that K, is indeed linear. Moreover, since K, operates on 
the infinite-dimensional space of (reasonable) functions, we may make an obvious 
analogy with matrix equations and consider the action of K, on a function / as 
that of a matrix on a column vector (both of infinite dimension). 

When written in operator form, the integral equations discussed in the pre- 
vious section resemble equations familiar from linear algebra. For example, the 
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inhomogeneous Fredholm equation of the first kind may be written as 

0 = / + XKy, 

which has the unique solution y = — /CN 1 / /X, provided that f =f= 0 and the inverse 
operator /C _1 exists. 

Similarly, we may write the corresponding Fredholm equation of the second 
kind as 


y=f + XJCy. (23.6) 

In the homogeneous case, where / = 0, this reduces to y = XlCy, which is 
reminiscent of an eigenvalue problem in linear algebra (except that X appears on 
the other side of the equation) and, similarly, only has solutions for at most a 
countably infinite set of eigenvalues X t . The corresponding solutions y t are called 
the eigenfunctions. 

In the inhomogeneous case (/ ^ 0), the solution to (23.6) can be written 
symbolically as 

y = (l-X)Cr 1 f, 

again provided that the inverse operator exists. It may be shown that, in general, 
(23.6) does possess a unique solution if X ^ A,-, i.e. when X does not equal one of 
the eigenvalues of the corresponding homogeneous equation. 

When X does equal one of these eigenvalues, (23.6) may have either many 
solutions or no solution at all, depending on the form of /. If the function / is 
orthogonal to every eigenfunction of the equation 

g = rich (23.7) 

that belongs to the eigenvalue X' , i.e. 

(gl/) = [ g*(x)f(x)dx = 0 
J a 

for every function g obeying (23.7), then it can be shown that (23.6) has many 
solutions. Otherwise the equation has no solution. These statements are discussed 
further in section 23.7, for the special case of integral equations with Hermitian 
kernels, i.e. those for which K, = KX . 


23.4 Closed-form solutions 

In certain very special cases, it may be possible to obtain a closed-form solution 
of an integral equation. The reader should realise, however, when faced with an 
integral equation, that in general it will not be soluble by the simple methods 
presented in this section but must instead be solved using (numerical) iterative 
methods, such as those outlined in section 23.5. 
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23.4.1 Separable kernels 


The most straightforward integral equations to solve are Fredholm equations 
with separable (or degenerate ) kernels. A kernel is separable if it has the form 

n 

K(x,z) = Y <t>i(x)y>i(z), (23.8) 

;=i 

where </>,-(x) are tp,(z) are respectively functions of x only and of z only and the 
number of terms in the sum, n, is finite. 

Let us consider the solution of the (inhomogeneous) Fredholm equation of the 
second kind, 

y(x)=f(x) + l f K(x,z)y(z)dz, (23.9) 

J a 


which has a separable kernel of the form (23.8). Writing the kernel in its separated 
form, the functions cj)j(x) may be taken outside the integral over z to obtain 


n rb 

y(x) = f(x) + A Y </>i(x) / Vi(z)y(z) dz. 

i= l Ja 


Since the integration limits a and b are constant for a Fredholm equation, the 
integral over z in each term of the sum is just a constant. Denoting these constants 
by 

Ci= f i pt(z)y(z)dz, (23.10) 

J a 

the solution to (23.9) is found to be 


y(x) = f(x) + A Y c i<l>r(x), (23.11) 

;=i 


where the constants c,- can be evalutated by substituting (23.11) into (23.10). 


► So/re the integral equation 


y(x) = x + X 


(xz + z 2 )y(z)dz. 


(23.12) 


The kernel for this equation is K(x,z) = xz + z 2 , which is clearly separable, and using the 
notation in (23.8) we have 4> i(x) = x, 4> 2 (x) = 1, xpi(z) = z and ip 2 (z) = z 2 . From (23.11) 
the solution to (23.12) has the form 

y(x) = x + 2(c ix + C 2 ), 

where the constants ci and C 2 are given by (23.10) as 

c-i = z[z + l(ciz + c 2 )]dz = j + \lci + \lc 2 , 

Jo 

C2= z 2 [z + A(ciZ + C 2 )\dz = J + ^Aci + \Xc 2 - 
Jo 
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These two simultaneous linear equations may be straightforwardly solved for ci and C 2 to 
give 

24 + A 18 

Cl “ 72 - 482 - 2 2 and C2 “ 72 -482 -A 2 ’ 

so that the solution to (23.12) is 


y(x) 


(72 - 242)x + 182 
72 - 482 - 2 2 


In the above example, we see that (23.12) has a (finite) unique solution provided 
that 2 is not equal to either root of the quadratic in the denominator of y(.x). 
The roots of this quadratic are in fact the eigenvalues of the corresponding 
homogeneous equation, as mentioned in the previous section. In general, if the 
separable kernel contains n terms, as in (23.8), there will be n such eigenvalues, 
although they may not all be different. 

Kernels consisting of trigonometric (or hyperbolic) functions of sums or differ- 
ences of x and 2 are also often separable. 


► Find the eigenvalues and corresponding eigenfunctions of the homogeneous Fredholm 
equation 

y(x) = 2 f sin {x + z)y(z)dz. (23.13) 

Jo 


The kernel of this integral equation can be written in separated form as 
K(x,z) = sin(x + z) = sin x cos z + cosx sin z, 


so, comparing with (23.8), we have </> i(x) = sinx, <(>2(x) = cosx, xpfz) = cosz and 
1^2(2) = sin z. 

Thus, from (23.11), the solution to (23.13) has the form 


y(x) = 2(ci sin x + C2 cos x), 


where the constants c 1 and C2 are given by 

r 

c i=2 / cos z (ci sin z + C2 cosz) dz = 
Jo 
r 

C 2 = 2 / sin z (<?i sin z + C2 cosz ) dz = 
Jo 


Xn 

T C2, 

(23.14) 

Xn 

y C1 - 

(23.15) 


Combining these two equations we find ci = (27i/2) 2 ci, and, assuming that c t f 0, this 
gives 2 = +2/71, the two eigenvalues of the integral equation (23.13). 

By substituting each of the eigenvalues back into (23.14) and (23.15), we find that 
the eigenfunctions corresponding to the eigenvalues 2i = 2/71 and X 2 = —2/n are given 
respectively by 


yi(x) = +(sinx + cosx) and L2(x) = B(sinx — cosx), (23.16) 


where A and B are arbitrary constants. ◄ 
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23.4.2 Integral transform methods 

If the kernel of an integral equation can be written as a function of the difference 
x — z of its two arguments, then it is called a displacement kernel. An integral 
equation having such a kernel, and which also has the integration limits — oo to 
oo, may be solved by the use of Fourier transforms (chapter 13). 

If we consider the following integral equation with a displacement kernel, 

/ OO 

K(x — z)y(z) dz, (23.17) 

-00 

the integral over z clearly takes the form of a convolution (see chapter 13). 
Therefore, Fourier-transforming (23.17) and using the convolution theorem, we 
obtain 

y(k) = f(k) + sj2nAK(k)y(k), 


which may be rearranged to give 


y(k) = 


m 

1 — y/2nAK(k) 


(23.18) 


Taking the inverse Fourier transform, the solution to (23.17) is given by 


vW --L r /We-P(fc) Jt 

Jht i-oo 1 - y/2nJlK(k) 

If we can perform this inverse Fourier transformation then the solution can be 
found explicitly; otherwise it must be left in the form of an integral. 


| ►find the Fourier transform of the function 

e ( x )=l 1 

' jo if\x\ > a. 

| Hence find an explicit expression for the solution of the integral equation 

y(x) =f(x) + k ( — —y(z)dz. 

J- 00 x—z 

I Find the solution for the special case f(x) = (sinx)/x. 


(23.19) 


The Fourier transform of g(x) is given directly by 


g (k) 


1 r 

/ exp (—ikx)dx = 

f2n J-a 


1 exp(— ikx) 
_J2 n ( -ik ) 


2 sin ka 
n k 


(23.20) 


The kernel of the integral equation (23.19) is K(x — z) = [sin(x — z)]/(x — z). Using 
(23.20), it is straightforward to show that the Fourier transform of the kernel is 


K(k) 


sJn/2 if \k\ < 1, 
0 if \k\ > 1. 
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Thus, using (23.18), we find the Fourier transform of the solution to be 


m 


f(k)/(l-nX) if \k\ < 1, 
f(k) if \k\ > I- 


(23.22) 


Inverse Fourier-transforming, and writing the result in a slightly more convenient form, 
the solution to (23.19) is given by 


y(x) = f(x) + 
= f(x) + 


1 


1 — TlX 
JtX 1 


1 

jin . 


f (k) exp(ikx) dk 


1 - 71/1 J2n J- 1 


f (k) exp(ikx) dk. 


(23.23) 


It is clear from (23.22) that when X = I/71, which is the only eigenvalue of the 
corresponding homogeneous equation to (23.19), the solution becomes infinite, as we 
would expect. 

For the special case f(x) = (sinx)/x, the Fourier transform f(k) is identical to that in 
(23.21), and the solution (23.23) becomes 


y(x) 


stnx 

x 

sinx 

x 

sinx 

x 


nX 


1 — nX 

nX 

1 — nX 
nX 

1 — nX 


sf2n J-i 


— exp (ikx) dk 


exp (ikx) 


sinx 

x 


1 


1 — 71/1 


If, instead, the integral equation (23.17) had integration limits 0 and x (so 
making it a Volterra equation) then its solution could be found, in a similar way, 
by using the convolution theorem for Laplace transforms (see chapter 13). We 
would find 


m = 


m 

i -xK( S y 


where s is the Laplace transform variable. Often one may use the dictionary of 
Laplace transforms given in table 13.1 to invert this equation and find the solution 
y(.x). In general, however, the evaluation of inverse Laplace transform integrals 
is difficult, since (in principle) it requires a contour integration; see chapter 20. 

As a final example of the use of Fourier transforms in solving integral equations, 
we mention equations that have integration limits — oo and oo and a kernel of the 
form 


K{x,z ) = exp(— ixz). 


Consider, for example, the inhomogeneous Fredholm equation 

/ OO 

exp(— ixz)y(z)dz. (23.24) 

-OO 

The integral over z is clearly just (a multiple of) the Fourier transform of y(z), 


869 




INTEGRAL EQUATIONS 


so we can write 

y(x) = f(x) + sj2nA.y(x). (23.25) 


If we now take the Fourier transform of (23.25) but continue to denote the 
independent variable by x (i.e. rather than k, for example), we obtain 

y(x) = f(x) + \f2nAy(—x). (23.26) 


Substituting (23.26) into (23.25) we find 

y(x) = /(x) + ^/2nX fix) + y[2nXy(—x) 
but on making the change x — » — x and substituting back in 
y(x) — f(x) + y/2n 2/(x) + 27ti 2 [/(— x) + \[2nXf(—x) 


for y(— x), this gives 
+ 27t2 2 y(x)l . 


Thus the solution to (23.24) is given by 


y(x) = 


1 

1 - (2tt) 2 2 4 


/(x) + (2n) l / 2 Af(x) + 2nX 2 f(—x) + (2n) 3 ^ 2 X 2 f(— x)J . 

(23.27) 


Clearly, (23.24) possesses a unique solution provided 2 f= +l/yf2n or +i/s/2n', 
these are easily shown to be the eigenvalues of the corresponding homogeneous 
equation (for which /(x) = 0). 


► So/re the integral equation 

y(x) = exp + 2 J exp (—ixz)y(z)dz, (23.28) 

where 2 is a real constant. Show that the solution is unique unless 2 has one of two particular 
values. Does a solution exist for either of these two values of 2? 


Following the argument given above, the solution to (23.28) is given by (23.27) with 
f(x) = exp(— x 2 /2). In order to write the solution explicitly, however, we must calculate 
the Fourier transform of f(x). Using equation (13.7), we find f(k) = exp(— k 2 /2), from 
which we note that fix) has the special property that its functional form is identical to 
that of its Fourier transform. Thus, the solution to (23.28) is given by 

y(x) = * 2 4 [1 + (2?r) 1/2 2 + Ink 2 + (2ti) 3/2 2 3 ] exp . 

i — (Z7i; / V 1 ) (23.29) 

Since 2 is restricted to be real, the solution to (23.28) will be unique unless 2 = +\/yj2n, 
at which points (23.29) becomes infinite. In order to find whether solutions exist for either 
of these values of 2 we must return to equations (23.25) and (23.26). 

Let us first consider the case 2 = +l/^7t. Putting this value into (23.25) and (23.26), 
we obtain 


(23.30) 

(23.31) 
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,v(x) = f(x) + y(x), 
y(x) = /(x) + y(-x). 
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Substituting (23.31) into (23.30) we find 

y(x) = /(x) +/(x) + y(-x), 

but on changing x to — x and substituting back in for y(—x), this gives 
y(x) = f(x) + f(x) + f(-x) + f(-x) + y(x). 

Thus, in order for a solution to exist, we require that the function f(x) obeys 

/(x)+/(x)+/(-x) + /(-*) = 0. 

This is satisfied if f(x) = —f(x), i.e. if the functional form of f(x) is minus the form of its 
Fourier transform. We may repeat this analysis for the case X = —i/ffln, and, in a similar 
way, we find that this time we require f(x) = f(x). 

In our case f(x) = exp(— x 2 /2), for which, as we mentioned above, f(x) = f(x). 
Therefore, (23.28) possesses no solution when X = +1 /^/27r but has many solutions when 
X = —X/ffln. ◄ 

A similar approach to the above may be taken to solve equations with kernels 
of the form K(x,y ) = cos.xy or sinxy, either by considering the integral over y in 
each case as the real or imaginary part of the corresponding Fourier transform 
or by using Fourier cosine or sine transforms directly. 


23.4.3 Differentiation 


A closed-form solution to a Volterra equation may sometimes be obtained by 
differentiating the equation to obtain the corresponding differential equation, 
which may be easier to solve. 


► Solve the integral equation 


y(x) = x — / xz 2 y(z)dz. 

Jo 

(23.32) 


Dividing through by x, we obtain 

^ = 1- fz 2 y(z)dz, 

X Jo 

which may be differentiated with respect to x to give 
d 
dx 

This equation may be integrated straightforwardly, and we find 


y(x) 

7 / \ 1 

[y(x)l 


X 

1 

II 

? 

K 

k 

1 

II 



In 


y(x) 


= ~4 +C ’ 


where c is a constant of integration. Thus the solution to (23.32) has the form 


y(x) = Ax exp ( 


(23.33) 


where A is an arbitrary constant. 

Since the original integral equation (23.32) contains no arbitrary constants, neither 
should its solution. We may calculate the value of the constant, A, by substituting the 
solution (23.33) back into (23.32), from which we find A = 1. ◄ 
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23.5 Neumann series 

As mentioned above, most integral equations met in practice will not be of the 
simple forms discussed in the last section and so, in general, it is not possible to 
find closed-form solutions. In such cases, we might try to obtain a solution in the 
form of an infinite series, as we did for differential equations (see chapter 16). 
Let us consider the equation 

y(x) = /(x) + / 1 / K(x,z)y(z)dz, (23.34) 

J a 

where either both integration limits are constants (for a Fredholm equation) or 
the upper limit is variable (for a Volterra equation). Clearly, if X were small then 
a crude (but reasonable) approximation to the solution would be 


y(x) « yo(.x) = f{x), 

where yoix) stands for our ‘zeroth-order’ approximation to the solution (and is 
not to be confused with an eigenfunction). 

Substituting this crude guess under the integral sign in the original equation, 
we obtain what should be a better approximation: 

rb rb 

yi(x) = f(x) + X / K{x,z)yo{z)dz = f(x) + X / K(x,z)f{z)dz, 

J a J a 


which is first order in X. Repeating the procedure once more results in the 
second-order approximation 


>’ 2 (x) = fix) + X I K(x,z)yi(z)dz 

J a 


= f(x) + X / R(.x,zi)/(zi)dzi + 2 2 / dzi K(x,zi)R(z l ,z 2 )/(z 2 )dz 2 . 


It is clear that we may continue this process to obtain progressively higher-order 
approximations to the solution. Introducing the functions 


Ri(x,z) = K(x,z), 

K 2 (x,z)= K(x,zi)K(zi,z)dzi, 

J a 

/ b pb 

dzi / K(x,zi)K(zi,z 2 )K(z 2 ,z)dz 2 , 


and so on, which obey the recurrence relation 


K„ix,z) = / K(x,zi)K„_i(zi,z)dzi, 
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we may write the nth-order approximation as 


y n (x)=f(x) + '52* m 

m= 1 


K m (x,z)f(z)dz. 


(23.35) 


The solution to the original integral equation is then given by y(x) = 
lim^oo y n (x), provided the infinite series converges. Using (23.35), this solution 
may be written as 

y(x)=f(x) + A I R{x,z;J)f(z)dz, (23.36) 

J a 

where the resolvent kernel R(x,z;A) is given by 

00 

R(x,z;i l) =^2rK m+l (x,z). (23.37) 

m = 0 

Clearly, the resolvent kernel, and hence the series solution, will converge 
provided A is sufficiently small. In fact, it may be shown that the series converges 
in some domain of |A| provided the original kernel K(x,z) is bounded in such a 
way that 

nb pb 

|A| 2 / dx I \K(x,z)\ 2 dz < 1. (23.38) 

J a J a 


► Use the Neumann series method to solve the integral equation 


y(x) = x + A / xzy(z) dz. 

Jo 


(23.39) 


Following the method outlined above, we begin with the crude approximation y(x) » 
yo(x) = -X. Substituting this under the integral sign in (23.39), we obtain the next approxi- 
mation 

f 1 f 1 2 Ax 

yi(x) = x + A / xzyo(z)dz = x + A / xz dz = x + — , 

Jo Jo 3 

Repeating the procedure once more, we obtain 


y 2 (x) = x + A / xzyi(z)dz 

I o 


= x + A 




’-X-) dz = x + ( j + j ) x. 


For this simple example, it is easy to see that by continuing this process the solution to 
(23.39) is obtained as 




y(x) = x + 

Clearly the expression in brackets is an infinite geometric series with first term A/3 and 
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common ratio A/3. Thus, provided |A| < 3, this infinite series converges to the value 
A/(3 — A), and the solution to (23.39) is 

(23.40) 

Finally, we note that the requirement that |A| < 3 may also be derived very easily from 
the condition (23.38). ◄ 


23.6 Fredholm theory 


In the previous section, we found that a solution to the integral equation (23.34) 
can be obtained as a Neumann series of the form (23.36), where the resolvent 
kernel R(x,z;A) is written as an infinite power series in A. This solution is valid 
provided the infinite series converges. 

A related, but more elegant, approach to the solution of integral equations 
using infinite series was found by Fredholm. We will not reproduce Fredholm’s 
analysis here, but merely state the results we need. Essentially, Fredholm theory 
provides a formula for the resolvent kernel R(x,z;A) in (23.36) in terms of the 
ratio of two infinite series: 


R(x,z;A) = 


D(x,z; A) 
d(k) ' 


The numerator and denominator in (23.41) are given by 
D(x,z;A) = — ^-D„(x,z)A", 


71=0 


n ! 


dU) = ]T 


(-D" 

d n r, 


71=0 


ill 


(23.41) 


(23.42) 

(23.43) 


where the functions D„(x, z) and the constants d„ are found from recurrence 
relations as follows. We start with 


D 0 (x,z) = K(x,z) and d 0 = 1, (23.44) 

where K(x,z) is the kernel of the original integral equation (23.34). The higher- 
order coefficients of A in (23.43) and (23.42) are then obtained from the two 
recurrence relations 

f h 

d„ = / D n -i(x,x)dx, (23.45) 

J a 

ft, 

D n (x,z) = K(x,z)d n —n / K(x,zi)D„_i(zi,z)rfzi. (23.46) 

J a 

Although the formulae for the resolvent kernel appear complicated, they are 
often simple to apply. Moreover, for the Fredholm solution the power series 
(23.42) and (23.43) are both guaranteed to converge for all values of A, unlike 
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Neumann series, which converge only if the condition (23.38) is satisfied. Thus the 
Fredholm method leads to a unique, non-singular solution, provided that d(/.) =/= 0. 
In fact, as we might suspect, the solutions of d(X) = 0 give the eigenvalues of the 
homogeneous equation corresponding to (23.34), i.e. with f(x) = 0. 


► Use Fredholm theory to solve the integral equation (23.39). 


Using (23.36) and (23.41), the solution to (23.39) can be written in the form 
y(x) = x + A 


'. j R(x,z; 

Jo 


; A)z dz = x + A f D(X ’*’ A) z dz. 
Jo d(A) 


(23.47) 


In order to find the form of the resolvent kernel R(x,z;A), we begin by setting 
Do(x,z) = K(x,z) = xz and do = 3 
and use the recurrence relations (23.45) and (23.46) to obtain 

1 


Ji = J D 0 (x, x)dx = J x 2 dx = ^ . 


xz 

T 


f xzrz dz l = — 

— xz 

"-l" 

Jo 1 3 


_ 3 _ 


= 0 . 


Applying the recurrence relations again we find that d n = 0 and D„(x,z) = 0 for n > 1. 
Thus, from (23.42) and (23.43), the numerator and denominator of the resolvent respectively 
are given by 


by 


D(x, z ; A) = xz and 4(A) = 1 — -. 

Substituting these expressions into (23.47), we find that the solution to (23.39) is given 


y(x) = x + A 
— x T A 


I o 1 — 2/3 


dz 


1-2/3 3 


= x + 


Ax 


3x 


3 — A 3 — A’ 


which, as expected, is the same as the solution (23.40) found by constructing a Neumann 
series. ◄ 


23.7 Schmidt-Hilbert theory 

The Schmidt-Hilbert (SH) theory of integral equations may be considered as 
analogous to the Sturm-Liouville (SL) theory of differential equations, discussed 
in chapter 17, and is concerned with the properties of integral equations with 
Hermitian kernels. An Hermitian kernel enjoys the property 

K(x,z) = K*(z,x), (23.48) 

and it is clear that a special case of (23.48) occurs for a real kernel that is also 
symmetric with respect to its two arguments. 
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Let us begin by considering the homogeneous integral equation 


y = 2/Cy, 


where the integral operator 1C has an Hermitian kernel. As discussed in sec- 
tion 23.3, in general, this equation will have solutions only for 2 = 2;, where the 2,- 
are the eigenvalues of the integral equation, the corresponding solutions y,- being 
the eigenfunctions of the equation. 

By following similar arguments to those presented in chapter 17 for SL theory, 
it may be shown that the eigenvalues 2,- of an Hermitian kernel are real and 
that the corresponding eigenfunctions y, belonging to dilferent eigenvalues are 
orthogonal and form a complete set. If the eigenfunctions are suitably normalised, 
we have 


(. yi\yj)=[ y’(x)yj(x)dx = <5 y . (23.49) 

J a 

If an eigenvalue is degenerate then the eigenfunctions corresponding to that 
eigenvalue can be made orthogonal by the Gram-Schmidt procedure, in a similar 
way to that discussed in chapter 17 in the context of SL theory. 

Like SL theory, SH theory does not provide a method of obtaining the eigen- 
values and eigenfunctions of any particular homogeneous integral equation with 
an Hermitian kernel; for this we have to turn to the methods discussed in the 
previous sections of this chapter. Rather, SH theory is concerned with the gen- 
eral properties of the solutions to such equations. Where SH theory becomes 
applicable, however, is in the solution of inhomogeneous integral equations with 
Hermitian kernels for which the eigenvalues and eigenfunctions of the corre- 
sponding homogeneous equation are already known. 

Let us consider the inhomogeneous equation 


y = f + 2/Cy, (23.50) 

where K, = lO and for which we know the eigenvalues 2; and normalised 
eigenfunctions y,- of the corresponding homogeneous problem. The function / 
may or may not be expressible solely in terms of the eigenfunctions y ; , and to 
accommodate this situation we write the unknown solution y as y = / + JA a,y,-, 
where the a ,• are expansion coefficients to be determined. 

Substituting this into (23.50), we obtain 

/ + E<w = / + 2Ey+«:/. (23.51) 

i i 

where we have used the fact that y,- = 2,/Cy,. Forming the inner product of both 
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sides of (23.51) with yj, we find 

= *Ylj(yj\yi) + ^y^f)- (23.52) 

i i 

Since the eigenfunctions are orthonormal and K, is an Hermitian operator, 
we have that both (.v/|y,) = <5,-/ and (>’ 7 |/C/) = (/Cy ; -|/) = 2“' (jq|/)- Thus the 
coefficients are given by 


and the solution is 


_ 1 {yj\f) _ Hyj\f) 

2j 1-22J 1 2; -2’ 


y=f + Yl aiyi = / + A 


(23.53) 


(23.54) 


This also shows, incidentally, that a formal representation for the resolvent kernel 
is 


R(x, z ; 2) 


\ ' y,(x)y*(z) 
^ 2; — A 

i 


(23.55) 


If / can be expressed as a linear superposition of the y t , i.e. / = b.y,-, then 

bj = (y,|/) and the solution can be written more briefly as 

>' = £ r=15=r < 23 - 56 > 

i * 

We see from (23.54) that the inhomogeneous equation (23.50) has a unique 
solution provided 2 7= 2 ; , i.e. when 2 is not equal to one of the eigenvalues of 
the corresponding homogeneous equation. However, if 2 does equal one of the 
eigenvalues Xj then, in general, the coefficients a j become singular and no (finite) 
solution exists. 

Returning to (23.53) we notice that even if 2 = Xj a non-singular solution to 
the integral equation is still possible provided that the function / is orthogonal 
to every eigenfunction corresponding to the eigenvalue Xj, i.e. 

(yj\f) = J y*(x)f{x)dx = 0. 

The following worked example illustrates the case in which / can be expressed in 
terms of the y t . One in which it cannot is considered in exercise 23.14. 
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► Use Schmidt-Hilbert theory to solve the integral equation 

y(x) = sin(x + a) + X f sin (x + z)y(z)dz. (23.57) 

Jo 


It is clear that the kernel K(x,z ) = sin(x + z) is real and symmetric in x and z and is 
thus Hermitian. In order to solve this inhomogeneous equation using SH theory, however, 
we must first find the eigenvalues and eigenfunctions of the corresponding homogeneous 
equation. 

In fact, we have considered the solution of the corresponding homogeneous equation 
(23.13) already in subsection 23.4.1, where we found that it has two eigenvalues = 2/rc 
and a .2 = —2/71, with eigenfunctions given by (23.16). The normalised eigenfunctions are 

yi(x) = — ^=(sinx + cosx) and y 2 (x) = — ^(sinx — cosx) (23.58) 

Jn ' Jn 

and are easily shown to obey the orthonormality condition (23.49). 

Using (23.54), the solution to the inhomogeneous equation (23.57) has the form 

y(x) = uiyi(x) + a 2 y 2 (x) 

where the coefficients a\ and a 2 are given by (23.53) w 
using (23.58), 

1 r 1 

ai = — - / -^(sinz +cosz)sin(z + a)dz 

1-71/1/2 Jo Jt 

i r l 

a 2 = — - / -^(sinz — cosz) sin(z + a)dz 

1 + 71 / 1/2 J o J 

Substituting these expressions for ai and a 2 into (23. 
the solution to (23.57) is given by 

y(x) = t sin ( x + “) + M/2) cos(x -«)].◄ 


, (23.59) 

ith /(x) = sin(x + a). Therefore, 


n 


2 — 7ll 

.fa 


(cos a + sin a), 
(cos a — sin a). 


2 + 71/1 

59) and simplifying, we find that 


23.8 Exercises 


23.1 


23.2 


23.3 


Solve the integral equation 

cos (xv)y(v)dv = exp(— x 2 /2), 

for the function y = y(x) for x > 0. Note that for x < 0, y(x) can be chosen as 
is most convenient. 

Solve 

f 00 a 

/ f (t) exp(-st) dt = 

Jo a -y s 

Use the fact that its kernel is separable to solve for y(x) the integral equation 



y(x) = A cos(x + a) + X / sin (x + z)y(z)dz. 

Jo 

(This equation is an inhomogeneous extension of the homogeneous Fredholm 
equation (23.13), and is similar to equation (23.57).) 
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23.4 


23.5 


23.6 


23.7 


Convert 


f{x) = ex px+ [ (x — y)f(y) dy 
Jo 


into a differential equation, and hence show that its solution is 


(a + fix) exp x + y exp(— x). 


where a, /?, y are constants that should be determined. 
Solve for cf>(x) the integral equation 


(j)(x) = f(x) + A 


Jo LVIV 

where f(x) is bounded for 0 < x < 1 and — 
in terms of the quantities F m = fg 1 f(y)y m dy. 



' < n < 


4>(y)dy, 

j, expressing your answer 


(a) Give the explicit solution when A = 1. 

(b) For what values of A are there no solutions unless F+„ take particular values? 
What are these values? 


(a) Consider the inhomogeneous integral equation 


fix) = g(x) + A f K(x,y)f(y) dy; 


its kernel K(x,y ) is real, symmetric and continuous in a < x < b, a < y < b. 
If A is one of the eigenvalues A t of the homogeneous equation 


fiix) = A t f K(x,y)fi(y) dy, 


prove that the inhomogeneous equation can only a have non-trivial solution 
if g(x) is orthogonal to the corresponding eigenfunction /,(x). 

(b) Show that the only values of A for which 

fix) = A f xy(x + y)fiy)dy 
Jo 


has a non-trivial solution are the roots of the equation 
A 2 + 1202 - 240 = 0. 


(c) Solve 

fix) = fix 2 + / 2 xy(x + y)f(y)dy. 

Jo 

(a) If the kernel of the integral equation 

r b 

if’(x) = A / K(x,y)ip(y)dy 
J a 

has the form 

00 

K(x,y) = ^2 h n (x)g„(y), 

n = 0 

where the h n (x) form a complete orthonormal set of functions over the 
interval [a, b], show that the eigenvalues At are given by 

|M — 2 -1 l| =0, 
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23.8 


23.9 


23.10 


23.11 


where M is the matrix with elements 


M kj = / g k (u)hj(u)du. 
J a 


If the corresponding solutions are ip ( '\x) = J2™=o a< nh„(x), find an expression 
for ajlK 

(b) Obtain the eigenvalues and eigenfunctions over the interval [0, 2n] if 


K{x,y) = Y 

n = 1 


1 

- cos nx cos ny. 
n 


By taking its Laplace transform, and that of x n e ax , obtain the explicit solution 
of 


f(x) = e 


x+ I (x — u)e“ f (u) du 


Verify your answer by substitution. 

For /(t) = exp(~j), use the relationships of the Fourier transforms of /'(f) and 
tf(t) to that of f(t) itself to find a simple differential equation satisfied by /(co), 
the Fourier transform of f(t) and hence determine /(co) to within a constant. Use 
this result to solve the integral equation 



for h(t). 

Show that the equation 

/»0O 

f{x) = x~ l/3 + X / f(y) exp(-xy) dy 
Jo 

has a solution of the form Ax x + BxK Determine the values of a and / and show 
that those of A and B are 

i and *nf) 

i-2 2 r(|)r(f) l -2 2 r(|)T(f)’ 

where T(s) is the gamma function, discussed in the appendix. 

At an international ‘peace’ conference a large number of delegates are seated 
around a circular table with each delegation sitting near its allies and diametrically 
opposite the delegation most bitterly opposed to it. The position of a delegate is 
denoted by 6, with 0 < 9 < 2n. The fury f(d) felt by the delegate at 6 is the sum 
of his own natural hostility h{6) and the influences on him of each of the other 
delegates; a delegate at position cf> contributes an amount K(6 — </>)X (</>)- Thus 

m = m+ [ 

Jo 

Show that if K(xp) takes the form K(\p) = ko + fci cos t p then 


f(9) = h(6) + p + q cos 6 + r sin 9 


and evaluate p , q and r. A positive value for k\ implies that delegates tend to 
placate their opponents but upset their allies, whilst negative values imply that 
they calm their allies but infuriate their opponents. A walkout will occur if f(8) 
exceeds a certain threshold value for some 9. Is this more likely to happen for 
positive or for negative values of kq ? 
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23.12 


23.13 


23.14 


By considering functions of the form h(x) = [J(x — y)f(y) dy, show that the 
solution / (x) of the integral equation 

f{x) = x+l f \x-y\f{y)dy 
Jo 


satisfies the equation /"(x) = f (x). 

By examining the special cases x = 0 and x = 1, show that 

m = , — ^ [(e + 2)e* - ee-*]. 

(e + 3)(e+ 1) 


The operator ,M is defined by 


/ CO 

K(x,y)f(y)dy, 

-00 


where K(x,y) = 1 inside the square |x| < a, |y| < a, and is equal to 0 elsewhere. 
Consider the possible eigenvalues of J4 and the eigenfunctions that correspond 
to them; show that the only possible eigenvalues are 0 and 2 a and determine the 
corresponding eigenfunctions. Hence find the general solution of 

/ OO 

K(x,y)f(y)dy. 

-00 

For the integral equation 


y(x) = x 3 + A / x 2 z 2 y(z)dz. 


23.15 


23.16 


show that the resolvent kernel is 5x 2 z 2 /[5 — 2(fo 5 — a 5 )] and hence solve the 
equation. For what range of X is the solution valid? 

Use Fredholm theory to show that, for the kernel 

K(x,z) = (x + z) cxp(x — z) 


over the interval [0, 1], the resolvent kernel is 


R(x,z;l) = 
and hence solve 


exp(x — z)[(x + z) — X(jx + \z — xz — |)] 




y(x) = x 2 +2 / (x + z)exp(x — z) y(z)dz. 


expressing your answer in terms of where /„ = f 0 u"exp(— u)du. 

(a) Determine the eigenvalues A+ of the kernel K(x,z) = (xz) 1/2 (x 1/2 + z 1/2 ) and 
show that the corresponding eigenfunctions have the forms 

y+(x) = A+Q 2x m + J3x), 

where A\ = 5/110 + 4^6). 

(b) Use Schmidt-Hilbert theory to solve 

y{x) = 1 + | f K(x,z)y(z)dz. 

~ Jo 

(c) As may be apparent, the algebra involved in the formal method used in (b) 
is long and error-prone, and it is in fact much more straightforward to use 
a trial function 1 + ax 1/2 + fix. Check your answer by doing so. 
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23.9 Hints and answers 

23.1 Define y(— x) = y(x ) and use the cosine Fourier transform inversion theorem; 
y(x) = (2/7t) 1/2 exp(-x 2 /2). 

23.2 Use the Laplace transform; /(f) = sin of. 

23.3 Set y(x) = C! sinx+c 2 cosx; y(x) = A[cos(x+a)+(27i/2)sin(x— a)]/[l— (2 2 7t 2 /4)]. 

23.4 f"(x) — f{x) = expx; a = 3/4, p = 1/2, y = 1/4. 

23.5 (a) <j>(x) = f(x) — (1 + 2 n)F n x" — (1 — 2 n)F_„x _ ". (b) There are no solutions for 
X = [1 ± (1 — 4fn)~ 1/2 ]~ 1 unless F+„ = 0 or F_„/F„ = +[( 1 — 2n)/(l + 2 j?)] 1/2 . 

23.6 (b) Set/(x) = «ix 2 +fl 2 ^ and obtain a\ = (A/4)ai+(2/3)fl2> = (A/5)ai+(2/4)a 2 ; 

(c) set f(x) = (/( + «i)x 2 + a 2 x; f(x) = —6jux(5x + 4). 

23.7 (a) a ® = /j’ h„(x)tp(x)dx; (b) use (l/ y Jn)cosnx and (l/^/it) sinnx; M is diagonal; 
eigenvalues Xk = k/n with i p ik) (x) = (l/^/it) coskx. 

23.8 Writing p(x) = e x f(x) and q(x) = x, the integrand can be expressed as a 
convolution. Show that p(s) = q(s)/[l — q(s)], leading to f(x) = (1 — e~ lx )/2. 

23.9 df / dm = —mf, leading to /(co) = Ae~ wl/1 . Rearrange the integral as a convolution 
and deduce that h(co) = Be~ ia,1/2 \ h(t) = Ce~ f2/6 , where re-substitution and 
Gaussian normalisation show that C = -j6n/2. 

23.10 Recall or prove that the Laplace transform of x~ n , where n < 1 but is not 
necessarily an integer, is T( 1 — n)s"~ l . For a possible solution a = —1/3 and 
P = —2/3, or vice versa. 

23.11 p = koH/(l —2nko),q = k\HJ(\ — \ki), and r = k\H s /{\ — Ifei), 

where H = L 2 ” h(z)dz , H c = f 2n h(z)cosz dz, and H s = f Q 2n h(z) sinz dz. Positive 
values of fci(« 2) are most likely to cause a conference breakdown. 

23.12 Write fg 1 |x - y\f(y) dy as f 0 x (x - y)f(y)dy + f x \y - x)f(y)dy. 

23.13 For eigenvalue 0 : f(x) = 0 for |x| < a or /(x) is such that f(y)dy = 0. For 

eigenvalue 2 a : f(x) = pS(x,a) with p a constant and S(x,a) = [H(a + x) — H(x — 
a)], where H{z) is the Heaviside step function. Take /(x) = g(x) + cGS(x, a), 
where G = f“ a g(z) dz. Show that c = 2/(1 — 2 aX). 

23.14 y(x) = x -3 + [5x 2 21n(6/a)]/[5 — X(b 5 — a 5 )] ; \X\ < 5/\b 5 — a 5 |. 

23.15 y(x) = x 2 — (3FX + / 2 )expx. 

23.16 (a) 576/(276 + 5 ). (b)(y ± |Kl) =H ± [(31/60)72 + (19/45)73], [1 -(5/2V2+]- 1 = 
+ 276 / 5 . For (b) and (c) y(x) = 1 — ^x 1/2 — \x. 
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24 


Group theory 


For systems that have some degree of symmetry, full exploitation of that symmetry 
is desirable. Significant physical results can sometimes be deduced simply by a 
study of the symmetry properties of the system under investigation. Consequently 
it becomes important, for such a system, to identify all those operations (rotations, 
reflections, inversions) that carry the system into a physically indistinguishable 
copy of itself. 

The study of the properties of the complete set of such operations forms 
one application of group theory. Though this is the aspect of most interest to 
the physical scientist, group theory itself is a much larger subject and of great 
importance in its own right. Consequently we leave until the next chapter any 
direct applications of group theoretical results and concentrate on building up 
the general mathematical properties of groups. 


24.1 Groups 

As an example of symmetry properties, let us consider the sets of operations, 
such as rotations, reflections, and inversions, that transform physical objects, for 
example molecules, into physically indistinguishable copies of themselves, so that 
only the labelling of identical components of the system (the atoms) changes in 
the process. For differently shaped molecules there are different sets of operations, 
but in each case it is a well-defined set, and with a little practice all members of 
each set can be identified. 

As simple examples, consider (u) the hydrogen molecule, and ( b ) the ammonia 
molecule illustrated in figure 24.1. The hydrogen molecule consists of two atoms 
FI of hydrogen and is carried into itself by any of the following operations : 

(i) any rotation about its long axis; 

(ii) rotation through n about an axis perpendicular to the long axis and 
passing through the point M that lies midway between the atoms; 
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Figure 24.1 (a) The hydrogen molecule, and ( b ) the ammonia molecule. 


(iii) inversion through the point M; 

(iv) reflection in the plane that passes through M and has its normal parallel 
to the long axis. 

These operations collectively form the set of symmetry operations for the hydro- 
gen molecule. 

The somewhat more complex ammonia molecule consists of a tetrahedron with 
an equilateral triangular base at the three corners of which lie hydrogen atoms 
H, whilst a nitrogen atom N is sited at the fourth vertex of the tetrahedron. The 
set of symmetry operations on this molecule is limited to rotations of n/3 and 
2n/3 about the axis joining the centroid of the equilateral triangle to the nitrogen 
atom, and reflections in the three planes containing that axis and each of the 
hydrogen atoms in turn. However, if the nitrogen atom could be replaced by a 
fourth hydrogen atom, and all interatomic distances equalised in the process, the 
number of symmetry operations would be greatly increased. 

Once all the possible operations in any particular set have been identified, it 
must follow that the result of applying two such operations in succession will be 
identical to that obtained by the sole application of some third (usually different) 
operation in the set - for if it were not, a new member of the set would have 
been found, contradicting the assumption that all members have been identified. 

Such observations introduce two of the main considerations relevant to decid- 
ing whether a set of objects, here the rotation, reflection and inversion operations, 
qualifies as a group in the mathematically tightly defined sense. These two consid- 
erations are (i) whether there is some law for combining two members of the set, 
and (ii) whether the result of the combination is also a member of the set. The 
obvious rule of combination has to be that the second operation is carried out 
on the system that results from application of the first operation, and we have 
already seen that the second requirement is satisfied by the inclusion of all such 
operations in the set. However, for a set to qualify as a group, more than these 
two conditions have to be satisfied, as will now be made clear. 
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24.1.1 Definition of a group 

A group Q is a set of elements {X, Y,...}, together with a rule for combining 
them that associates with each ordered pair X, Y a ‘product’ or combination law 
X • Y for which the following conditions must be satisfied. 

(i) For every pair of elements X, Y that belongs to Q, the product X • Y also 
belongs to Q. (This is known as the closure property of the group.) 

(ii) For all triples X, Y, Z the associative law holds; in symbols, 

X • (Y • Z) = (X • Y) • Z . (24.1) 

(iii) There exists a unique element /, belonging to Q , with the property that 

I .X =X = X»I (24.2) 

for all X belonging to Q. This element / is known as the identity element 
of the group. 

(iv) For every element X of Q, there exists an element X -1 , also belonging to 
Q, such that 

X- 1 »X =1 =Z«X _1 . (24.3) 

X~ l is called the inverse of X. 

An alternative notation in common use is to write the elements of a group Q 
as the set {Gi,G 2 ,...} or, more briefly, as {G,}, a typical element being denoted 
by G,. 

It should be noticed that, as given, the nature of the operation • is not stated. It 
should also be noticed that the more general term element , rather than operation, 
has been used in this definition. We will see that the general definition of a 
group allows as elements not only sets of operations on an object but also sets of 
numbers, of functions and of other objects, provided that the interpretation of • 
is appropriately defined. 

In one of the simplest examples of a group, namely the group of all integers 
under addition, the operation • is taken to be ordinary addition. In this group the 
role of the identity I is played by the integer 0, and the inverse of an integer X is 
—X. That requirements (i) and (ii) are satisfied by the integers under addition is 
trivially obvious. A second simple group, under ordinary multiplication, is formed 
by the two numbers 1 and —1; in this group, closure is obvious, 1 is the identity 
element, and each element is its own inverse. 

It will be apparent from these two examples that the number of elements in a 
group can be either finite or infinite. In the former case the group is called a finite 
group and the number of elements it contains is called the order of the group, 
which we will denote by g; an alternative notation is \Q\ but has obvious dangers 
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if matrices are involved. In the notation in which Q = {Gi,G 2 , the order 
of the group is clearly n. 

As we have noted, for the integers under addition zero is the identity. For 
the group of rotations and reflections, the operation of doing nothing, i.e. the 
null operation, plays this role. This latter identification may seem artificial, but 
it is an operation, albeit trivial, which does leave the system in a physically 
indistinguishable state, and needs to be included. One might add that without it 
the set of operations would not form a group and none of the powerful results 
we will derive later in this and the next chapter could be justifiably applied to 
give deductions of physical significance. 

In the examples of rotations and reflections mentioned earlier, • has been taken 
to mean that the left-hand operation is carried out on the system that results 
from application of the right-hand operation. Thus 

Z = X • Y (24.4) 

means that the effect on the system of carrying out Z is the same as would 
be obtained by first carrying out Y and then carrying out X. The order of the 
operations should be noted; it is arbitrary in the first instance but, once chosen, 
must be adhered to. The choice we have made is dictated by the fact that most 
of our applications involve the effect of rotations and reflections on functions of 
space coordinates, and it is usual, and our practice in the rest of this book, to 
write operators acting on functions to the left of the functions. 

It will be apparent that for the above-mentioned group, integers under ordinary 
addition, it is true that 


Y»X = X*Y (24.5) 

for all pairs of integers X, Y . If any two particular elements of a group satisfy 
(24.5), they are said to commute under the operation •; if all pairs of elements in 
a group satisfy (24.5), then the group is said to be Abelian. The set of all integers 
forms an infinite Abelian group under (ordinary) addition. 

As we show below, requirements (iii) and (iv) of the definition of a group 
are over-demanding (but self-consistent), since in each of equations (24.2) and 
(24.3) the second equality can be deduced from the first by using the associativity 
required by (24.1). The mathematical steps in the following arguments are all 
very simple, but care has to be taken to make sure that nothing that has not 
yet been proved is used to justify a step. For this reason, and to act as a model 
in logical deduction, a reference in Roman numerals to the previous result, 
or to the group definition used, is given over each equality sign. Such explicit 
detailed referencing soon becomes tiresome, but it should always be available if 
needed. 
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► Using only the first equalities in (24.2) and (24.3), deduce the second ones. 


Consider the expression X 1 • (X • X ! ); 

X- 1 • (X • X” 1 ) ( = (X- 1 • X) • x- 1 / . x- 1 

( ^x-\ 

But X -1 belongs to Q , and so from (iv) there is an element U in Q such that 

U • X -1 = I . (v) 

Form the product of U with the first and last expressions in (24.6) to give 

UMX" 1 .(X.X- 1 )) = U •x - 1 { = I. 

Transforming the left-hand side of this equation gives 

U • (X- 1 • (X • X-f) (U .X-‘).(X .x- 1 ) 

( = / • (X • X -1 ) 

^X.X- 1 . 

Comparing (24.7), (24.8) shows that 

X • X - 1 = J, (iv)' 

i.e. the second equality in group definition (iv). Similarly 

X./ ^ X • (X -1 • X) *= (X • X -1 ) • X 

Wl.X 

( ix. 

i.e. the second equality in group definition (iii). ◄ 


(in') 


(24.6) 


(24.7) 


(24.8) 


The uniqueness of the identity element I can also be demonstrated rather than 
assumed. Suppose that /', belonging to Q, also has the property 


I' »X = X = X» I' 
Take X as I, then 

/'•/=!. 

Further, from (iii'), 

X = X»I 


for all X belonging to Q. 


(24.9) 


for all X belonging to Q, 
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and setting X = I' gives 

/'=/'•/. (24.10) 

It then follows from (24.9), (24.10) that / = showing that in any particular 
group the identity element is unique. 

In a similar way it can be shown that the inverse of any particular element 
is unique. If U and V are two postulated inverses of an element X of Q, by 
considering the product 

U • (X • V) = (U • X) • V, 

it can be shown that U = V. The proof is left to the reader. 

Given the uniqueness of the inverse of any particular group element, it follows 
that 

([/• F« ••• • Y «Z)*(Z _1 • Y- 1 • •• •• V- 1 • l/ -1 ) 

= (U • V • • • • • Y) • (Z • Z _1 ) • (Y' 1 • ■ • • • V~ l • U” 1 ) 

= (U • V • • • • • 7) • (Y -1 • • • • • F _1 • l/ -1 ) 


= T 

where use has been made of the associativity and of the two equations Z • Z -1 = I 
and I • 2T = X. Thus the inverse of a product is the product of the inverses in 
reverse order, i.e. 

([/.F.--.7.Z)- 1 = (Z^ 1 • 7" 1 • ■ • • • F -1 • U' 1 ). (24.11) 

Further elementary results that can be obtained by arguments similar to those 
above are as follows. 

(i) Given any pair of elements X, Y belonging to Q, there exist unique 
elements U, V, also belonging to Q, such that 

X • U = Y and V»X=Y. 

Clearly U = X • Y, and V = Y • X -1 , and they can be shown to be 
unique. This result is sometimes called the division axiom. 

(ii) The cancellation law can be stated as follows. If 

X« Y =X*Z 

for some X belonging to Q, then Y —Z. Similarly, 

y • x = z • x 


implies the same conclusion. 
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Figure 24.2 Reflections in the three perpendicular bisectors of the sides of 
an equilateral triangle take the triangle into itself. 


(iii) Forming the product of each element of Q with a fixed element X of G 
simply permutes the elements of Q ; this is often written symbolically as 
Q • X = Q. If this were not so, and X • Y and X • Z were not different 
even though Y and Z were, application of the cancellation law would lead 
to a contradiction. This result is called the permutation law. 

In any finite group of order g, any element X when combined with itself to 
form successively X 2 = X • X, X 3 = X • X 2 , . . . will, after at most g — 1 such 
combinations, produce the group identity /. Of course X 2 , X 3 , ... are some of 
the original elements of the group, and not new ones. If the actual number of 
combinations needed is m — 1 , i.e. X m = /, then m is called the order of the element 
X in Q. The order of the identity of a group is always 1, and that of any other 
element of a group that is its own inverse is always 2. 


► Determine the order of the group of ( two-dimensional ) rotations and reflections that take 
a plane equilateral triangle into itself and the order of each of the elements. The group is 
usually known as 3 m (to physicists and crystallographers) or C^ v (to chemists). 


There are two (clockwise) rotations, by 2n/3 and 4 tt/ 3, about an axis perpendicular to 
the plane of the triangle. In addition, reflections in the perpendicular bisectors of the three 
sides (see figure 24.2) have the defining property. To these must be added the identity 
operation. Thus in total there are six distinct operations and so g = 6 for this group. 
To reproduce the identity operation either of the rotations has to be applied three times, 
whilst any of the reflections has to be applied just twice in order to recover the original 
situation. Thus each rotation element of the group has order 3, and each reflection element 
has order 2. ◄ 

A so-called cyclic group is one for which all members of the group can be 
generated from just one element X (say). Thus a cyclic group of order g can be 
written as 


g = {i,x,x 2 ,x\...,x s- 1 }. 
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It is clear that cyclic groups are always Abelian and that each element, apart 
from the identity, has order g, the order of the group itself. 


24.1.2 Further examples of groups 

In this section we consider some sets of objects, each set together with a law of 
combination, and investigate whether they qualify as groups and, if not, why not. 

We have already seen that the integers form a group under ordinary addition, 
but it is immediately apparent that (even if zero is excluded) they do not do 
so under ordinary multiplication. Unity must be the identity of the set, but the 
requisite inverse of any integer n, namely 1 /n, does not belong to the set of 
integers for any n other than unity. 

Other infinite sets of quantities that do form groups are the sets of all real 
numbers, or of all complex numbers, under addition, and of the same two sets 
excluding 0 under multiplication. All these groups are Abelian. 

Although subtraction and division are normally considered the obvious coun- 
terparts of the operations of (ordinary) addition and multiplication, they are not 
acceptable operations for use within groups since the associative law, (24.1), does 
not hold. Explicitly, 


X-(Y -Z)^(X-Y)-Z. 
X-h(Y H-Z) 7 f(X-h Y ) -t- Z . 


From within the field of all non-zero complex numbers we can select just those 
that have unit modulus, i.e. are of the form e' e where 0 < 9 < 2n, to form a 
group under multiplication, as can easily be verified: 

e l6i x e lBl = e'( 0 i +f h) (closure), 

e'° = 1 (identity), 

g »( 2 jt- 0 ) x e i8 _ e i 2 n = e ’0 — i (inverse). 


Closely related to the above group is the set of 2 x 2 rotation matrices that take 
the form 


M(0) = 


cos 0 — sin 9 
sin 9 cos 9 


where, as before, 0 < 9 < 2k. These form a group when the law of combination 
is that of matrix multiplication. The reader can easily verify that 

M(0)M(</>) = M(0 + </>) 

M(0) = b 

M(2ti-0) = M^(0) 


Here l 2 is the unit 2x2 matrix. 
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24.2 Finite groups 

Whilst many properties of physical systems (e.g. angular momentum) are related 
to the properties of infinite, and, in particular, continuous groups, the symmetry 
properties of crystals and molecules are more intimately connected with those of 
finite groups. We therefore concentrate in this section on finite sets of objects that 
can be combined in a way satisfying the group postulates. 

Although it is clear that the set of all integers does not form a group under 
ordinary multiplication, restricted sets can do so if the operation involved is 
multiplication (mod N) for suitable values of N; this operation will be explained 
below. 

As a simple example of a group with only four members, consider the set S 
defined as follows: 

S = {1,3, 5, 7} under multiplication (mod 8). 

To find the product (mod 8) of any two elements, we multiply them together in 
the ordinary way, and then divide the answer by 8, treating the remainder after 
doing so as the product of the two elements. For example, 5 x 7 = 35, which on 

dividing by 8 gives a remainder of 3. Clearly, since Y x Z = Z x 7, the full set 

of different products is 

1x1 = 1, 1x3 = 3, 1x5 = 5, 1x7 = 7, 

3x3 = 1, 3x5 = 7, 3x7 = 5, 

5x5 = 1, 5x7 = 3, 

7x7 = 1. 

The first thing to notice is that each multiplication produces a member of the 
original set, i.e. the set is closed. Obviously the element 1 takes the role of the 
identity, i.e. lxf = Y for all members Y of the set. Further, for each element Y 
of the set there is an element Z (equal to Y, as it happens, in this case) such that 
Y x Z = 1, i.e. each element has an inverse. These observations, together with the 
associativity of multiplication (mod 8), show that the set S is an Abelian group 
of order 4. 

It is convenient to present the results of combining any two elements of a 
group in the form of multiplication tables - akin to those which used to appear in 
elementary arithmetic books before electronic calculators were invented ! Written 
in this much more compact form the above example is expressed by table 24.1. 
Although the order of the two elements being combined does not matter here 
because the group is Abelian, we adopt the convention that if the product in a 
general multiplication table is written X • Y then X is taken from the left-hand 
column and Y is taken from the top row. Thus the bold ‘7’ in the table is the 
result of 3 x 5, rather than of 5 x 3. 

Whilst it would make no difference to the basic information content in a table 
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1 

3 

5 

7 

1 

1 

3 

5 

7 

3 

3 

1 

7 

5 

5 

5 

7 

1 

3 

7 

7 

5 

3 

1 


Table 24.1 The table of products for the elements of the group S = {1, 3, 5, 7} 
under multiplication (mod 8). 

to present the rows and columns with their headings in random orders, it is 
usual to list the elements in the same order in both the vertical and horizontal 
headings in any one table. The actual order of the elements in the common list, 
whilst arbitrary, is normally chosen to make the table have as much symmetry as 
possible. This is initially a matter of convenience, but, as we shall see later, some 
of the more subtle properties of groups are revealed by putting next to each other 
elements of the group that are alike in certain ways. 

Some simple general properties of group multiplication tables can be deduced 
immediately from the fact that each row or column constitutes the elements of 
the group. 

(i) Each element appears once and only once in each row or column of the 
table; this must be so since Q • X — Q (the permutation law) holds. 

(ii) The inverse of any element Y can be found by looking along the row 
in which Y appears in the left-hand column (the Y th row), and noting 
the element Z at the head of the column (the Zth column) in which 
the identity appears as the table entry. An immediate corollary is that 
whenever the identity appears on the leading diagonal, it indicates that 
the corresponding header element is of order 2 (unless it happens to be 
the identity itself). 

(iii) For any Abelian group the multiplication table is symmetric about the 
leading diagonal. 

To get used to the ideas involved in using group multiplication tables, we now 
consider two more sets of integers under multiplication (mod N ): 

S' = {1,5,7,11} under multiplication (mod 24), and 
S" = {1,2, 3,4} under multiplication (mod 5). 

These have group multiplication tables 24.2(a) and ( b ) respectively, as the reader 
should verify. 

If tables 24.1 and 24.2(a) for the groups S and S' are compared, it will be seen 
that they have essentially the same structure, i.e if the elements are written as 
{I, A, B, C} in both cases, then the two tables are each equivalent to table 24.3. 

For S, I = l, A = 3, B = 5, C = 1 and the law of combination is multiplication 
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1 

5 

7 

11 



1 

2 

3 

4 

1 

1 

5 

7 

11 


1 

1 

2 

3 

4 

5 

5 

1 

11 

7 

( b ) 

2 

2 

4 

1 

3 

7 

7 

11 

1 

5 


3 

3 

1 

4 

2 

11 

11 

7 

5 

1 


4 

4 

3 

2 

1 


Table 24.2 (a) The multiplication table for the group S' = {1,5,7,11} under 
multiplication (mod 24). ( b ) The multiplication table for the group S" = 
{1,2, 3,4} under multiplication (mod 5). 



I 

A 

B 

C 

I 

I 

A 

B 

C 

A 

A 

I 

C 

B 

B 

B 

C 

I 

A 

C 

C 

B 

A 

I 


Table 24.3 The common structure exemplified by tables 24.1 and 24.2(a). 



1 i 

— 1 —i 

1 

1 i 

— 1 —i 

i 

i —1 

—i 1 

-1 

— 1 —i 

1 i 

—i 

—i 1 

i — 1 


Table 24.4 The group table for the set {1, i, — 1, — i} under ordinary multipli- 
cation of complex numbers. 


(mod 8), whilst for S', I = 1, A = 5, B = 7, C = 11 and the law of combination 
is multiplication (mod 24). However, the really important point is that the two 
groups S and S’ have equivalent group multiplication tables - they are said to 
be isomorphic, a matter to which we will return more formally in section 24.5. 


► Determine the behaviour of the set of four elements 

{1, i, — 1 , — i} 

under the ordinary multiplication of complex numbers. Show that they form a group and 
determine whether the group is isomorphic to either of the groups S ( itself isomorphic to 
S' ) and S" defined above. 


That the elements form a group under the associative operation of complex multiplication 
is immediate; there is an identity (1), each possible product generates a member of the set 
and each element has an inverse (1, —i, —1, i, respectively). The group table has the form 
shown in table 24.4. 

We now ask whether this table can be made to look like table 24.3, which is the 
standardised form of the tables for S and S'. Since the identity element of the group ( 1 ) 
will have to be represented by I, and '1' only appears on the leading diagonal twice whereas 
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1 i 

—1 —i 


1 

2 

4 

3 

1 

1 i 

— 1 —i 

i 

1 

2 

4 

3 

i 

i —1 

—i 1 

2 

2 

4 

3 

1 

-i 

— 1 —i 

1 i 

4 

4 

3 

1 

2 

—i 

—i 1 

i — 1 

3 

3 

1 

2 

4 


Table 24.5 A comparison between tables 24.4 and 24.2(b), the latter with its 
columns reordered. 



I 

A 

B 

C 

I 

I 

A 

B 

C 

A 

A 

B 

C 

I 

B 

B 

C 

I 

A 

C 

C 

I 

A 

B 


Table 24.6 The common structure exemplified by tables 24.4 and 24.2(b), the 
latter with its columns reordered. 


/ appears on the leading diagonal four times in table 24.3, it is clear that no amount of 
relabelling (or, equivalently, no allocation of the symbols A, B, C, amongst i, —1, — i) can 
bring table 24.4 into the form of table 24.3. We conclude that the group {1, i, — 1,— i} is 
not isomorphic to S or S'. An alternative way of stating the observation is to say that 
the group contains only one element of order 2 whilst a group corresponding to table 24.3 
contains three such elements. However, if the rows and columns of table 24.2(b) - in which 
the identity does appear twice on the diagonal and which therefore has the potential to be 
equivalent to table 24.4 - are rearranged by making the heading order 1, 2, 4, 3 then the 
two tables can be compared in the forms shown in table 24.5. They can thus be seen to 
have the same structure, namely that shown in table 24.6. 

We therefore conclude that the group of four elements {1, i, —1, — i} under ordinary mul- 
tiplication of complex numbers is isomorphic to the group {1,2, 3,4} under multiplication 
(mod 5). ◄ 

What we have done does not prove it, but the two tables 24.3 and 24.6 are in 
fact the only possible tables for a group of order 4, i.e. a group containing exactly 
four elements. 


24.3 Non-Abelian groups 

So far, all the groups for which we have constructed multiplication tables have 
been based on some form of arithmetic multiplication, a commutative operation, 
with the result that the groups have been Abelian and the tables symmetric 
about the leading diagonal. We now turn to examples of groups in which some 
non-commutation occurs. It should be noted, in passing, that non-commutation 
cannot occur throughout a group, as the identity always commutes with any 
element in its group. 
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As the first example we consider again as elements of a group the two- 
dimensional operations which transform an equilateral triangle into itself (see 
the end of subsection 24.1.1). It has already been shown that there are six 
such operations; the null operation, two rotations (by 27r/3 and 47i/3 about 
an axis perpendicular to the plane of the triangle) and three reflections in the 
perpendicular bisectors of the three sides. To abbreviate we will denote these 
operations by symbols as follows. 

(i) / is the null operation. 

(ii) R and R' are (clockwise) rotations by 2n/3 and 47i/3 respectively. 

(iii) K, L, M are reflections in the three lines indicated in figure 24.2. 


Some products of the operations of the form X • Y (where it will be recalled 
that the symbol • means that the second operation X is carried out on the system 
resulting from the application of the first operation Y ) are easily calculated : 


R*R = R', R' •R' =R, R»R' = / = R' »R 
K • K = L» L = M • M = I. 


(24.12) 


Others, such as K • M, are more difficult, but can be found by a little thought, 


or by making a model triangle 

or 

drawing a 

sequence of diagrams such as those 

following. 





/6\ 


V 

/x\ 

/©\ 

K • M / \ = K / 


\ = 

/ \ 

= R'/ \ 

/O A /O 


A 7 

A A 

/O A 

showing that K • M = R'. In the same way, 



/c\ 



A 

A 

M • K / \ = M/ 


\ = 

/ \ 

= R / \ 

/O x\ /x 


0\ L 

x A 

/O A 

shows that M • K = R, and 





A 

A 


A 


R» L / \ = R / 


\ = 

/ \ 

= K / \ 

/O A /O 


X z 

x A 

/o A 

shows that R • L = K. 





Proceeding in this way we 

can build up the complete multiplication table 


(table 24.7). In fact, it is not necessary to draw any more diagrams, as all 
remaining products can be deduced algebraically from the three found above and 


895 



GROUP THEORY 



I 

R 

R' 

K 

L 

M 

I 

I 

R 

R' 

K 

L 

M 

R 

R 

R' 

I 

M 

K 

L 

R' 

R' 

I 

R 

L 

M 

K 

K 

K 

L 

M 

I 

R 

R’ 

L 

L 

M 

K 

R' 

I 

R 

M 

M 

K 

L 

R 

R' 

I 


Table 24.7 The group table for the two-dimensional symmetry operations on 
an equilateral triangle. 

the more self-evident results given in (24.12). A number of things may be noticed 
about this table. 

(i) It is not symmetric about the leading diagonal, indicating that some pairs 
of elements in the group do not commute. 

(ii) There is some symmetry within the 3x3 blocks that form the four quarters 
of the table. This occurs because we have elected to put similar operations 
close to each other when choosing the order of table headings - the two 
rotations (or three if I is viewed as a rotation by On/3) are next to each 
other, and the three reflections also occupy adjacent columns and rows. 
We will return to this later. 

That two groups of the same order may be isomorphic carries over to non- 
Abelian groups. The next two examples are each concerned with sets of six 
objects ; they will be shown to form groups that, although very different in nature 
from the rotation-reflection group just considered, are isomorphic to it. 

We consider first the set M of six orthogonal 2x2 matrices given by 



the combination law being that of ordinary matrix multiplication. Here we use 
italic, rather than the sans serif used for matrices elsewhere, to emphasise that 
the matrices are group elements. 

Although it is tedious to do so, it can be checked that the product of any 
two of these matrices, in either order, is also in the set. However, the result is 
generally different in the two cases, as matrix multiplication is non-commutative. 
The matrix I clearly acts as the identity element of the set, and during the checking 
for closure it is found that the inverse of each matrix is contained in the set, /, 
C, D and E being their own inverses. The group table is shown in table 24.8. 
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I 

A 

B 

C 

D 

E 

I 

I 

A 

B 

C 

D 

E 

A 

A 

B 

I 

E 

C 

D 

B 

B 

I 

A 

D 

E 

C 

C 

C 

D 

E 

I 

A 

B 

D 

D 

E 

C 

B 

I 

A 

E 

E 

C 

D 

A 

B 

I 


Table 24.8 The group table, under matrix multiplication, for the set M. of 
six orthogonal 2x2 matrices given by (24.13). 


The similarity to table 24.7 is striking. If {R,R',K,L,M} of that table are 
replaced by {A, B,C,D,E} respectively, the two tables are identical, without even 
the need to reshuffle the rows and columns. The two groups, one of reflections 
and rotations of an equilateral triangle, the other of matrices, are isomorphic. 

Our second example of a group isomorphic to the same rotation-reflection 
group is provided by a set of functions of an undetermined variable x. The 
functions are as follows: 

/i(x) = x, f 2 (x) = 1/(1 - x), / 3 (x) = (x - l)/x, 

/ 4 (x) = 1/ X, / 5 (x) = 1 - x, / 6 (x) = x/(x - 1), 

and the law of combination is 


fi(x) • / )(x) = 


i.e. the function on the right acts as the argument of the function on the left to 
produce a new function of x. It should be emphasised that it is the functions 
that are the elements of the group. The variable x is the ‘system’ on which they 
act, and plays much the same role as the triangle does in our first example of a 
non-Abelian group. 

To show an explicit example, we calculate the product /<•, •/ 3 . The product 
will be the function of x obtained by evaluating y/(y — 1), when y is set equal to 
(x — l)/x. Explicitly 


hih) = 


(x — l)/x 
(x — l)/x — 1 


1 - x = / 5 (x). 


Thus /<; • / 3 = / 5 . Further examples are 


fi • h 


1 

l-l/(l-x) 


x — 1 

X 


— fll 


and 


/«•/ 6 


x/(x — 1) 
x/(x — 1 ) — 1 


(24.14) 
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The multiplication table for this set of six functions has all the necessary proper- 
ties to show that they form a group. Further, if the symbols /i,/ 2 ,/ 3 ,/ 4 ,/ 5,/6 are 
replaced by I,A,B,C,D,E respectively the table becomes identical to table 24.8. 
This justifies our earlier claim that this group of functions, with argument sub- 
stitution as the law of combination, is isomorphic to the group of reflections and 
rotations of an equilateral triangle. 

24.4 Permutation groups 

The operation of rearranging n distinct objects amongst themselves is called a 
permutation of degree n, and since many symmetry operations on physical systems 
can be viewed in that light, the properties of permutations are of interest. For 
example, the symmetry operations on an equilateral triangle, to which we have 
already given much attention, can be considered as the six possible rearrangements 
of the marked corners of the triangle amongst three fixed points in space, much 
as in the diagrams used to compute table 24.7. In the same way, the symmetry 
operations on a cube can be viewed as a rearrangement of its corners amongst 
eight points in space, albeit with many constraints, or, with fewer complications, 
as a rearrangement of its body diagonals in space. The details will be left until 
we review the possible finite groups more systematically. 

The notations and conventions used in the literature to describe permutations 
are very varied and can easily lead to confusion. We will try to avoid this by using 
letters a, b, c,... (rather than numbers) for the objects that are rearranged by a 
permutation and by adopting, before long, a ‘cycle notation’ for the permutations 
themselves. It is worth emphasising that it is the permutations, i.e. the acts of 
rearranging, and not the objects themselves (represented by letters) that form 
the elements of permutation groups. The complete group of all permutations of 
degree n is usually denoted by S„ or The number of possible permutations of 
degree n is n\, and so this is the order of S n . 

Suppose the ordered set of six distinct objects {a b c d e /} is rearranged by 
some process into {b e f a d c}; then we can represent this mathematically as 

9{a bcdef} = {bef a d c], 

where 6 is a permutation of degree 6. The permutation 8 can be denoted by 
[2 5 6 1 4 3], since the first object, a, is replaced by the second, b, the second 
object, b, is replaced by the fifth, e, the third by the sixth, /, etc. The equation 
can then be written more explicitly as 

8{a b c d e f} = [ 2 5614 3]{u bcdef} = {bef a d c}. 

If cj) is a second permutation, also of degree 6, then the obvious interpretation of 
the product cj) • 6 of the two permutations is 

(f) • 8{a b c d e /} = cj)(9{a b c d e /}). 
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Suppose that (j> is the permutation [4 5 3 6 2 1]; then 

(j) • ftcde/} = [4 5362 1][2 5614 3]{a b c d e /} 
= [4 5 3 6 2 1 ]{b e f a d c} 

= {a d f c e b] 

= [1 4 6 3 5 2]{a b c d e /}. 

Written in terms of the permutation notation this result is 
[4 5 3 6 2 1] [2 5 6 1 4 3] = [1 4 6 3 5 2], 


A concept that is very useful for working with permutations is that of decom- 
position into cycles. The cycle notation is most easily explained by example. For 
the permutation 0 given above: 

the 1st object, a, has been replaced by the 2nd, fe; 
the 2nd object, ft, has been replaced by the 5th, e; 
the 5th object, e, has been replaced by the 4th, d; 
the 4th object, d, has been replaced by the 1st, a. 

This brings us back to the beginning of a closed cycle, which is conveniently 
represented by the notation (1 2 5 4), in which the successive replacement 
positions are enclosed, in sequence, in parentheses. Thus (1 2 5 4) means 2nd 
— > 1st, 5th — ► 2nd, 4th — » 5th, 1st — > 4th. It should be noted that the object 
initially in the first listed position replaces that in the final position indicated in 
the bracket - here ‘a’ is put into the fourth position by the permutation. Clearly 
the cycle (5 4 1 2), or any other which involved the same numbers in the same 
relative order, would have exactly the same meaning and effect. The remaining 
two objects, c and /, are interchanged by 8 or, more formally, are rearranged 
according to a cycle of length 2, a transposition, represented by (3 6). Thus the 
complete representation (specification) of 0 is 

8 = (1 2 5 4)(3 6). 

The positions of objects that are unaltered by a permutation are either placed by 
themselves in a pair of parentheses or omitted altogether. The former is recom- 
mended as it helps to indicate how many objects are involved - important when 
the object in the last position is unchanged, or the permutation is the identity, 
which leaves all objects unaltered in position! Thus the identity permutation of 
degree 6 is 

I = (1)(2)(3)(4)(5)(6), 

though in practice it is often shortened to (1). 

It will be clear that the cycle representation is unique, to within the internal 
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absolute ordering of the numbers in each bracket as already noted, and that 
each number appears once and only once in the representation of any particular 
permutation. 

The order of any permutation of degree n within the group S„ can be read off 
from the cyclic representation and is given by the lowest common multiple (LCM) 
of the lengths of the cycles. Thus / has order 1, as it must, and the permutation 
9 discussed above has order 4 (the LCM of 4 and 2). 

Expressed in cycle notation our second permutation cj) is (3)(1 4 6)(2 5), and 
the product f • 9 is calculated as 

(3)(1 4 6)(2 5) • (1 2 5 4)(3 6 ){a b c d e /} = (3)(1 4 6)(2 5 ){b e f a d c] 

= {a d f c e b] 

= (1)(5)(2 4 3 6 ){a b cde /}. 

i.e. expressed as a relationship amongst elements of the group of permutations of 
degree 6 (not yet proved as a group, but reasonably anticipated), this result reads 

(3)(1 4 6)(2 5) • (1 2 5 4)(3 6) = (1)(5)(2 4 3 6). 

We note, for practice, that cj> has order 6 (the LCM of 1, 3, and 2) and that the 
product cj) • 9 has order 4. 

The number of elements in the group S„ of all permutations of degree n is 
n! and clearly increases very rapidly as n increases. Fortunately, to illustrate the 
essential features of permutation groups it is sufficient to consider the case n = 3, 
which involves only six elements. They are as follows (with labelling which the 
reader will by now recognise as anticipatory): 

I = (1)(2)(3) ,4 = (12 3) B = (1 3 2) 

C = (1)(2 3) D = (3)(1 2) E = (2)(1 3) 

It will be noted that A and B have order 3, whilst C, D and E have order 2. As 
perhaps anticipated, their combination products are exactly those corresponding 
to table 24.8, 1, C, D and E being their own inverses. For example, putting in all 
steps explicitly, 

D • C{a b c} = (3)(1 2) • (1)(2 3){a b c} 

= (3)(12){a c b) 

= {c a b] 

= (3 2 1 ){a b c} 

= (13 2 ){a b c} 

= B{a b c}. 

In brief, the six permutations belonging to S3 form yet another non- Abelian group 
isomorphic to the rotation-reflection symmetry group of an equilateral triangle. 
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24.5 Mappings between groups 

Now that we have available a range of groups that can be used as examples, 
we return to the study of more general group properties. From here on, when 
there is no ambiguity we will write the product of two elements, X • Y, simply 
as XY , omitting the explicit combination symbol. We will also continue to use 
‘multiplication’ as a loose generic name for the combination process between 
elements of a group. 

If Q and Q' are two groups, we can study the effect of a mapping 

0 > : Q — Q' 

of Q onto Q' . If X is an element of Q we denote its image in Q' under the mapping 
<t> by X' = 4>(X). 

A technical term that we have already used is isomorphic. We will now define 
it formally. Two groups Q — {X,Y,...} and Q' = {X',Y are said to be 
isomorphic if there is a one-to-one correspondence 

X <-> X', Y <-> Y', ■■■ 

between their elements such that 

XY =Z implies X'Y'=Z' 

and vice versa. 

In other words, isomorphic groups have the same (multiplication) structure, 
although they may differ in the nature of their elements, combination law and 
notation. Clearly if groups Q and Q' are isomorphic, and Q and Q" are isomorphic, 
then it follows that Q' and Q" are isomorphic. We have already seen an example of 
four groups (of functions of x, of orthogonal matrices, of permutations and of the 
symmetries of an equilateral triangle) that are isomorphic, all having table 24.8 
as their multiplication table. 

Although our main interest is in isomorphic relationships between groups, the 
wider question of mappings of one set of elements onto another is of some 
importance, and we start with the more general notion of a homomorphism. 

Let Q and Q' he two groups and d> a mapping of Q — > Q' . If for every pair of 
elements X and Y in Q 

(XY)' =X'Y' 

then is called a homomorphism, and Q' is said to be a homomorphic image of Q. 

The essential defining relationship, expressed by (XY)' = X'Y', is that the 
same result is obtained whether the product of two elements is formed first and 
the image then taken or the images are taken first and the product then formed. 
Three immediate consequences of the above definition are proved as follows. 
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(i) If / is the identity of Q then IX = X for all X in Q. Consequently 

X' = (IX)' = I'X', 


for all X' in Q ' . Thus /' is the identity in Q' . In words, the identity element 
of Q maps into the identity element of Q' . 

(ii) Further, 

r = (xx~ 1 )' = x'(x- 1 )'. 


That is, ( X ~ 1 )' = (X') _1 . In words, the image of an inverse is the same 
element in Q’ as the inverse of the image. 

(iii) If element X in Q is of order m, i.e. I = X m , then 


/' = (X m )' = (XX'"- 1 )' = X'fX'"- 1 )' 



m factors 


In words, the image of an element has the same order as the element. 


What distinguishes an isomorphism from the more general homomorphism are 
the requirements that in an isomorphism: 


(I) different elements in Q must map into different elements in Q' (whereas in 
a homomorphism several elements in Q may have the same image in Q'), 
that is, x! = y' must imply x = y; 

(II) any element in Q' must be the image of some element in Q. 


An immediate consequence of (I) and result (iii) for homomorphisms is that 
groups that are isomorphic each have the same number of elements of any given 
order. 

For a general homomorphism, the set of elements of Q whose image in Q' 
is V is called the kernel of the homomorphism; this is discussed further in the 
next section. In an isomorphism the kernel consists of the identity / alone. To 
illustrate both this point and the general notion of a homomorphism, consider 
a mapping between the additive group of real numbers 3? and the multiplicative 
group of complex numbers with unit modulus, (7(1). Suppose that the mapping 
iff — 1/(1) is 

d) : x -> e ix ; 


then this is a homomorphism since 

(x + y)’ — e* x+y) = e ix e iy = x’y’. 

However, it is not an isomorphism because many (an infinite number) of the 
elements of 3? have the same image in U(l). For example, n,3n,5n,... in 31 all 
have the image —1 in (7(1) and, furthermore, all elements of 31 of the form 2nn, 
where n is an integer, map onto the identity element in (7(1). The latter set forms 
the kernel of the homomorphism. 
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Table 24.9 Reproduction of (a) table 24.8 and (b) table 24.3 with the relevant 
subgroups shown in bold. 


For the sake of completeness, we add that a homomorphism for which (I) above 
holds is said to be a monomorphism (or an isomorphism into), whilst a homomor- 
phism for which (II) holds is called an epimorphism (or an isomorphism onto). If, 
in either case, the other requirement is met as well then the monomorphism or 
epimorphism is also an isomorphism. 

Finally, if the initial and hnal groups are the same, Q = Q', then the isomorphism 
Q ► Q' is termed an automorphism. 


24.6 Subgroups 

More detailed inspection of tables 24.8 and 24.3 shows that not only do the 
complete tables have the properties associated with a group multiplication table 
(see section 24.2) but so do the upper left corners of each table taken on their 
own. The relevant parts are shown in bold in the tables 24.9(a) and ( b ). 

This observation immediately prompts the notion of a subgroup. A subgroup 
of a group Q can be formally defined as any non-empty subset 7i — {H,} of 
Q , the elements of which themselves behave as a group under the same rule of 
combination as applies in Q itself. As for all groups, the order of the subgroup is 
equal to the number of elements it contains; we will denote it by h or \H\. 

All groups Q contain two trivial subgroups: 

(i) Q itself; 

(ii) the set Z consisting of the identity element alone. 

All other subgroups are termed proper subgroups. In a group with multiplication 
table 24.8 the elements {I, A, B] form a proper subgroup, as do {I, A} in a group 
with table 24.3 as its group table. 

Some groups have no proper subgroups. For example, the so-called cyclic 
groups, mentioned at the end of subsection 24.1.1, have no subgroups other 
than the whole group or the identity alone. Tables 24.10(a) and ( b ) show the 
multiplication tables for two of these groups. Table 24.6 is also the group table 
for a cyclic group, that of order 4. 
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Table 24.10 The group tables of two cyclic groups, of orders 3 and 5. They 
have no proper subgroups. 


It will be clear that for a cyclic group Q repeated combination of any element 
with itself generates all other elements of Q, before finally reproducing itself. So, 
for example, in table 24.10(h), starting with (say) £), repeated combination with 
itself produces, in turn, C, B, A, I and finally D again. As noted earlier, in any 
cyclic group Q every element, apart from the identity, is of order g, the order of 
the group itself. 

The two tables shown are for groups of orders 3 and 5. It will be proved in 
subsection 24.7.2 that the order of any group is a multiple of the order of any of 
its subgroups (Lagrange’s theorem), i.e. in our general notation, g is a multiple 
of h. It thus follows that a group of order p, where p is any prime, must be cyclic 
and cannot have any proper subgroups. The groups for which tables 24.10 (a) and 
( b ) are the group tables are two such examples. Groups of non-prime order may 
(table 24.3) or may not (table 24.6) have proper subgroups. 

As we have seen, repeated multiplication of an element X (not the identity) 
by itself will generate a subgroup {X,X 2 ,2f 3 ,...}. The subgroup will clearly be 
Abelian, and if X is of order m, i.e. X m — I, the subgroup will have m distinct 
members. If m is less than g - though, in view of Lagrange’s theorem, m must 
be a factor of g - the subgroup will be a proper subgroup. We can deduce, in 
passing, that the order of any element of a group is an exact divisor of the order 
of the group. 

Some obvious properties of the subgroups of a group Q, which can be listed 
without formal proof, are as follows. 

(i) The identity element of Q belongs to every subgroup TL. 

(ii) If element X belongs to a subgroup 7Y, so does X -1 . 

(iii) The set of elements in Q that belong to every subgroup of Q themselves 
form a subgroup, though it may consist of the identity alone. 

Properties of subgroups that need more explicit proof are given in the follow- 
ing sections, though some need the development of new concepts before they 
can be established. However, we can begin with a theorem, applicable to all 
homomorphisms, not just isomorphisms, that requires no new concepts. 

Let : Q — > Q' be a homomorphism of Q into Q ' ; then 
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(i) the set of elements TL' in Q' that are images of the elements of Q forms a 
subgroup of Q'; 

(ii) the set of elements K, in Q that are mapped onto the identity V in Q' forms 
a subgroup of Q. 

As indicated in the previous section, the subgroup /C is called the kernel of the 
homomorphism. 

To prove (i), suppose Z and W belong to H', with Z = X' and W = Y', where 
X and Y belong to Q. Then 

ZW =X'Y' = {XY )' 
and therefore belongs to li' , and 

z 1 = (XT 1 = 

and therefore belongs to TC . These two results, together with the fact that /' 
belongs to H', are enough to establish result (i). 

To prove (ii), suppose X and Y belong to /C ; then 

(XY)' = X'Y' = /'/' = I’ (closure), 

/' = (XX" 1 )' = X'(X - 1 )' = /'(X” 1 )' = (X -1 )' 

and therefore X~ l belongs to 1C. These two results, together with the fact that / 
belongs to 1C, are enough to establish (ii). An illustration of this result is provided 
by the mapping <i> of 9? — > 1/(1) considered in the previous section. Its kernel 
consists of the set of real numbers of the form 2nn where n is an integer; they 
form a subgroup of 1Z, the additive group of real numbers. 

In fact the kernel 1C of a homomorphism is a normal subgroup of Q. The 
defining property of such a subgroup is that for every element X in Q and every 
element Y in the subgroup, XYX~* belongs to the subgroup. This property is 
easily verified for the kernel 1C, since 

{XY X” 1 )' = X'Y'fX- 1 )' = X'/'fX- 1 )' = X'tX” 1 )' = I'. 

Anticipating the discussion of subsection 24.7.2, the cosets of a normal subgroup 
themselves form a group (see exercise 24.16). 


24.7 Subdividing a group 

We have already noted, when looking at the (arbitrary) order of headings in a 
group table, that some choices appear to make the table more orderly than do 
others. In the following subsections we will identify ways in which the elements 
of a group can be divided up into sets with the property that the members of any 
one set are more like the other members of the set, in some particular regard, 
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than they are like any element that does not belong to the set. We will find that 
these divisions will be such that the group is partitioned, i.e. the elements will be 
divided into sets in such a way that each element of the group belongs to one, 
and only one, such set. 

We note in passing that the subgroups of a group do not form such a partition, 
not least because the identity element is in every subgroup, rather than being in 
precisely one. In other words, despite the nomenclature, a group is not simply the 
aggregate of its proper subgroups. 


24.7.1 Equivalence relations and classes 

We now specify in a more mathematical manner what it means for two elements 
of a group to be ‘more like’ one another than like a third element, as mentioned 
in section 24.2. Our introduction will apply to any set, whether a group or not, 
but our main interest will ultimately be in two particular applications to groups. 
We start with the formal definition of an equivalence relation. 

An equivalence relation on a set S is a relationship X ~ Y, between two 
elements X and Y belonging to S, in which the definition of the symbol ~ must 
satisfy the requirements of 

(i) reflexivity, X X; 

(ii) symmetry, X ~ Y implies Y ~ X ; 

(iii) transitivity, X ~ Y and Y ~ Z imply X ~ Z. 

Any particular two elements either satisfy or do not satisfy the relationship. 

The general notion of an equivalence relation is very straightforward, and the 
requirements on ~ seem undemanding; but not all relationships qualify. As an 
example within the topic of groups, if ~ meant ‘has the same order as’ then 
clearly all the requirements would be satisfied. However, if ~ meant ‘commutes 
with’ then it would not be an equivalence relation, since although A commutes 
with I, and / commutes with C, this does not necessarily imply that A commutes 
with C, as is obvious from table 24.8. 

It may be shown that an equivalence relation on S divides up S into classes C t 
such that: 

(i) X and Y belong to the same class if, and only if, X ~ Y ; 

(ii) every element W of S belongs to exactly one class. 

This may be shown as follows. Let X belong to S, and define the subset Sx of 
S to be the set of all elements U of S such that X ~ U. Clearly by reflexivity 
X belongs to Sx- Suppose first that X ~ Y, and let Z be any element of <Sy. 
Then Y ~ Z, and hence by transitivity X ~ Z, which means that Z belongs to 
Sx- Conversely, since the symmetry law gives Y ~ X, if Z belongs to Sx then 
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this implies that Z belongs to Sy. These two results together mean that the two 
subsets Sx and Sy have the same members and hence are equal. 

Now suppose that Sx equals Sy. Since Y belongs to Sy it also belongs to Sx 
and hence X ~ Y. This completes the proof of (i), once the distinct subsets of 
type Sx are identified as the classes C,-. Statement (ii) is an immediate corollary, 
the class in question being identified as Sw- 

The most important property of an equivalence relation is as follows. 

Two different subsets Sx and Sy can have no element in common, and the collection 
of all the classes Ci is a ‘partition ' of S, i.e. every element in S belongs to one, and 
only one, of the classes. 

To prove this, suppose Sx and Sy have an element Z in common; then X ~ Z 
and Y ~ Z and so by the symmetry and transitivity laws X ~ Y . By the above 
theorem this implies Sx equals Sy. But this contradicts the fact that Sx and Sy 
are different subsets. Hence Sx and Sy can have no element in common. 

Finally, if the elements of S are used in turn to define subsets and hence classes 
in S, every element U is in the subset Su that is either a class already found or 
constitutes a new one. It follows that the classes exhaust S, i.e. every element is 
in some class. 

Having established the general properties of equivalence relations, we now turn 
to two specific examples of such relationships, in which the general set S has the 
more specialised properties of a group Q and the equivalence relation ~ is chosen 
in such a way that the relatively transparent general results for equivalence 
relations can be used to derive powerful, but less obvious, results about the 
properties of groups. 


24.7.2 Congruence and cosets 

As the first application of equivalence relations we now prove Lagrange’s theorem 
which is stated as follows. 

If Q is a finite group of order g and Li is a subgroup of Q of order h 
then g is a multiple of h. 

We take as the definition of ~ that, given X and Y belonging to Q, X ~ Y if 
X~ { Y belongs to Li. This is the same as saying that Y = XH t for some element 
Hj belonging to Li ; technically X and Y are said to be left-congruent with respect 
to Li. 

This defines an equivalence relation, since it has the following properties. 

(i) Reflexivity: X ~ X, since X~ l X = I and I belongs to any subgroup. 

(ii) Symmetry : X ~ Y implies that X~ 1 Y belongs to Li and so, therefore, does 
its inverse, since Li is a group. But (A -1 Y ) _1 = Y~*X and, as this belongs 
to Li, it follows that Y ~ X. 
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(iii) Transitivity: X ~ Y and Y ~ Z imply that X -1 Y and Y _1 Z belong to 7 T 
and so, therefore, does their product (X ” 1 Y)(Y - 1 Z) = X - 1 Z, from which 
it follows that X rv 2 " . 

With ~ proved as an equivalence relation, we can immediately deduce that it 
divides Q into disjoint (non-overlapping) classes. For this particular equivalence 
relation the classes are called the left cosets of TL. Thus each element of Q is in 
one and only one left coset of TL. The left coset containing any particular X is 
usually written XU, , and denotes the set of elements of the form XHj (one of 
which is X itself since TL contains the identity element); it must contain It different 
elements, since if it did not, and two elements were equal, 

XHj = XHj, 

we could deduce that H t = Hj and that TL contained fewer than h elements. 

From our general results about equivalence relations it now follows that the 
left cosets of TL are a ‘partition’ of Q into a number of sets each containing h 
members. Since there are g members of Q and each must be in just one of the 
sets, it follows that g is a multiple of h. This concludes the proof of Lagrange’s 
theorem. 

The number of left cosets of TL in Q is known as the index of TL in Q and is 
written [Q : TL]\ numerically the index = g/h. For the record we note that, for 
the trivial subgroup X, which contains only the identity element, [Q : X] = g and 
that, for a subgroup J of subgroup TL , [Q :TL][TL : J] = [Q : J]. 

The validity of Lagrange’s theorem was established above using the far-reaching 
properties of equivalence relations. However, for this specific purpose there is a 
more direct and self-contained proof, which we now give. 

Let X be some particular element of a finite group Q of order g, and TL be a 
subgroup of Q of order h, with typical element Y,. Consider the set of elements 

XTL = {XY u XY 2 ,...,XY h }. 

This set contains h distinct elements, since if any two were equal, i.e. XY, = XYj 
with i f j, this would contradict the cancellation law. As we have already seen, 
the set is called a left coset of TL. 

We now prove three simple results. 

• Two cosets are either disjoint or identical. Suppose cosets X{H and XfH have 
an element in common, i.e. X1Y1 = X2Y2 for some Y\,Yi in TL. Then Xi = 
X 2 Y2Yf\ and since Y\ and Y 2 both belong to TL so does Y 2 Y , -1 ; thus Xi 
belongs to the left coset X{TL. Similarly X 2 belongs to the left coset X\TL. 
Consequently, either the two cosets are identical or it was wrong to assume 
that they have an element in common. 
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• Two cosets X{H and X{H are identical if, and only if, X, l X[ belongs to TL. If 
X^Xi belongs to TL then Xi = X 2 T, for some i, and 

XfH = X 2 YiH = X 2 TL , 

since by the permutation law YfH = TL. Thus the two cosets are identical. 

Conversely, suppose X\TL = X 2 TL. Then Xf*Xi TL = TL. But one element of 
TL (on the left of the equation) is I ; thus Xf l Xi must also be an element of TL 
(on the right). This proves the stated result. 

• Every element of Q is in some left coset XTL. This follows trivially since TL 
contains I, and so the element X t is in the coset X t TL. 

The final step in establishing Lagrange’s theorem is, as previously, to note that 
each coset contains h elements, that the cosets are disjoint and that every one of 
the g elements in Q appears in one and only one distinct coset. It follows that 
g = kh for some integer k. 

As noted earlier, Lagrange’s theorem justifies our statement that any group of 
order p, where p is prime, must be cyclic and cannot have any proper subgroups : 
since any subgroup must have an order that divides p, this can only be 1 or p, 
corresponding to the two trivial subgroups T and the whole group. 

It may be helpful to see an example worked through explicitly, and we again 
use the same six-element group. 

► Find the left cosets of the proper subgroup TL of the group Q that has table 24.8 as its 
multiplication table. 


The subgroup consists of the set of elements TL = {I,A,B}. We note in passing that it has 
order 3, which, as required by Lagrange’s theorem, is a divisor of 6, the order of Q. As in 
all cases, TL itself provides the first (left) coset, formally the coset 

ITL = {II ,IA,I B} = {I,A,B}. 

We continue by choosing an element not already selected, C say, and form 

CTL = {CI,CA,CB} = {C,D,E}. 

These two cosets of TL exhaust Q , and are therefore the only cosets, the index of TL in Q 
being equal to 2. 

This completes the example, but it is useful to demonstrate that it would not have 
mattered if we had taken D, say, instead of / to form a first coset 

DTL = { DI,DA,DB } = {D,E,C}, 

and then, from previously unselected elements, picked B, say: 

BTL = {BI.BA.BB} = { B,I,A }. 

The same two cosets would have resulted. ◄ 

It will be noticed that the cosets are the same groupings of the elements 
of Q which we earlier noted as being the choice of adjacent column and row 
headings that give the multiplication table its ‘neatest’ appearance. Furthermore, 
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if H is a normal subgroup of Q then its (left) cosets themselves form a group (see 
exercise 24.16). 


24.7.3 Conjugates and classes 

Our second example of an equivalence relation is concerned with those elements 
X and 7 of a group Q that can be connected by a transformation of the form 
7 = Gy 1 XG i , where G, is an (appropriate) element of Q. Thus X ~ 7 if there 
exists an element G, of Q such that 7 = G“' YG,. Dilferent pairs of elements X 
and 7 will, in general, require different group elements G,. Elements connected 
in this way are said to be conjugates. 

We first need to establish that this does indeed define an equivalence relation, 
as follows. 

(i) Reflexivity: X ~ X, since X = I -1 XI and / belongs to the group. 

(ii) Symmetry: X ~ 7 implies 7 = G^YG, and therefore X = (Gy 1 )^ 1 Y Gf 1 . 
Since G, belongs to Q so does G ~ ] , and it follows that 7 ~ X. 

(iii) Transitivity: X ~ 7 and 7 ~ Z imply 7 = G“' X G, and Z = G“' 7 Gj 
and therefore Z = Gj 1 G~ 1 XGiGj = (G,G ; ) -1 Y(G,Gj). Since G, and Gj 
belong to Q so does G,G/, from which it follows that Y ~ Z. 

These results establish conjugacy as an equivalence relation and hence show 
that it divides Q into classes, two elements being in the same class if, and only if, 
they are conjugate. 

Immediate corollaries are: 

(i) If Z is in the class containing / then 

Z = Gi l lGt = G~ 1 G i = I. 

Thus, since any conjugate of I can be shown to be /, the identity must be 
in a class by itself. 

(ii) If X is in a class by itself then 

7 = G-'XGj 

must imply that 7 = X. But 

X = GiG^XGiGy 1 

for any G„ and so 

X = GdG^YGdGr 1 = GfYGy 1 = GfXGy 1 , 
i.e. XGt = GjX for all G,. 

Thus commutation with all elements of the group is a necessary (and 
sufficient) condition for any particular group element to be in a class by 
itself. In an Abelian group each element is in a class by itself. 
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(iii) In any group Q the set S of elements in classes by themselves is an Abelian 
subgroup (known as the centre of Q). We have shown that I belongs to S, 
and so if, further, X G, = G,X and Y Gi = G , Y for all G, belonging to G 
then : 

(a) (XY)Gj = XGjY = G,(YY), i.e. the closure of S, and 

(b) XGj = GjX implies X _1 G, = G,-X _1 , i.e. the inverse of X belongs 
to S. 

Hence S is a group, and clearly Abelian. 

Yet again for illustration purposes, we use the six-element group that has 
table 24.8 as its group table. 

>-Find the conjugacy classes of the group Q having table 24.8 as its multiplication table. 

As always, / is in a class by itself, and we need consider it no further. 

Consider next the results of forming X^AX, as X runs through the elements of Q. 

I- l AI A-' A A B-'AB C^AC D^AD E~ l AE 

= I A = I A =AI =CE = DC = ED 

= A =A =A =B =B =B 

Only A and B are generated. It is clear that {A, B} is one of the conjugacy classes of Q. 
This can be verified by forming all elements X^BX; again only A and B appear. 

We now need to pick an element not in the two classes already found. Suppose we 
pick C. Just as for A , we compute X^CX, as X runs through the elements of Q. The 
calculations can be done directly using the table and give the following: 

X : I A B C D E 

X- l CX : C E D C E D 

Thus C, D and E belong to the same class. The group is now exhausted, and so the three 
conjugacy classes are 

{/}, {A,B}, {C, £>,£}.◄ 

In the case of this small and simple, but non-Abelian, group, only the identity 
is in a class by itself (i.e. only / commutes with all other elements). It is also the 
only member of the centre of the group. 

Other areas from which examples of conjugacy classes can be taken include 
permutations and rotations. Two permutations are in the same class if their 
cycle specifications have the same structure. For example, in S 5 the permutations 
(1 3 5)(2)(4) and (2 5 3)(1)(4) are in the same class as each other but in a different 
class from that which contains (1 5)(2 4)(3). 

In the case of the continuous rotation group, rotations by the same angle 6 
about any two axes labelled i and j are in the same class, because the group 
contains a rotation that takes the first axis into the second. Without going into 
mathematical details, a rotation about axis i can be represented by the operator 
Ri{6), and the two rotations are connected by a relationship of the form 

Rj(6) = tjRimtj, 
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in which </>,y is the member of the full continuous rotation group that takes axis 
/' into axis j. 


24.8 Exercises 


24.1 


24.2 


24.3 


24.4 


24.5 


For each of the following sets, determine whether they form a group under the op- 
eration indicated (where it is relevant you may assume that matrix multiplication 
is associative): 


(a) 

(b) 

(c) 

(d) 

(e) 


the integers (mod 10) under addition; 

the integers (mod 10) under multiplication; 

the integers 1,2, 3,4, 5, 6 under multiplication (mod 7); 

the integers 1,2, 3, 4, 5 under multiplication (mod 6); 

all matrices of the form 


a — b 
b 



where a and b are integers (mod 5), and a ^ 0 ^ b, under matrix multiplica- 
tion; 

(f) those elements of the set in (e) that are of order 1 or 2 (taken together); 

(g) all matrices of the form 


where a, b , c are integers, 


under matrix multiplication. 

Which of the following relationships between X and Y are equivalence relations? 
Give a proof of your conclusions in each case: 

(a) X and Y are integers and X — Y is odd; 

(b) X and Y are integers and X — Y is even; 

(c) X and Y are people and have the same postcode; 

(d) X and Y are people and have a parent in common; 

(e) X and Y are people and have the same mother; 

(f) X and Y are n x n matrices satisfying Y = PXQ, where P and Q are elements 
of a group Q of n X n matrices. 

Define a binary operation • on the set of real numbers by 

x • y = x + y 4- rxy, 

where r is a non-zero real number. Show that the operation • is associative. 

Prove that x»y = — if, and only if, x = — r _1 or y = —r^ 1 . Hence prove that 
the set of all real numbers excluding —r^ 1 forms a group under the operation •. 
Prove that the relationship X ~ 7, defined by X ~ Y if Y can be expressed in 
the form 

qX b 
cX + d' 


Y = 


with a , b , c and d as integers, is an equivalence relation on the set of real numbers 
9?. Identify the class that contains the real number 1. 

The following is a ‘proof" that reflexivity is an unnecessary axiom for an equiva- 
lence relation. 
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Because of symmetry X ~ Y implies Y ~ X. Then by transitivity X ~ Y and 
Y ~ X imply X ~ X. Thus symmetry and transitivity imply reflexivity, which 
therefore need not be separately required. 

Demonstrate the flaw in this proof using the set consisting of all real numbers plus 
the number i. Show by investigating the following specific cases that, whether or 
not reflexivity actually holds, it cannot be deduced from symmetry and transitivity 
alone. 


24.6 


24.7 


24.8 


24.9 


24.10 


24.11 


(a) X ~ Y if X + Y is real. 

(b) X ~ Y if XY is real. 

Prove that the set M of matrices 

■*-(? b c )• 

where a, b, c are integers (mod 5) and a 0 =f= c, forms a non- Abelian group 
under matrix multiplication. 

Show that the subset containing elements of M that are of order 1 or 2 does 
not form a proper subgroup of M 


(a) using Lagrange’s theorem, 

(b) by direct demonstration that the set is not closed. 


S is the set of all 2 x 2 matrices of the form 


A = 


w 

y 



where wz —xy = 1. 


Show that 5 is a group under matrix multiplication. Which element! s) have order 
2? Prove that an element A has order 3ifw + z + l=0. 

Show that, under matrix multiplication, matrices of the form 


M(ao.a) 


qq T Q\i — cii T d 2 i 

«2 + 03 i do — 01 i 


where a 0 and the components of column matrix a = (ai a 2 o 3 ) T are real num- 
bers satisfying Oo + |a| 2 = 1, form a group. Deduce that, under the transformation 
z — > Mz, where z is any column matrix, |z| 2 is invariant. 

If A is a group in which every element other than the identity, /, has order 2, 
prove that A is Abelian. Hence show that if X and Y are distinct elements of A, 
neither being equal to the identity, then the set {I,X,Y ,XY} forms a subgroup 
of A. 

Deduce that if B is a group of order 2 p, with p a prime greater than 2, then B 
must contain an element of order p. 

The group of rotations (excluding reflections and inversions) in three dimensions 
that take a cube into itself is known as the group 432 (or 0 in the usual chemical 
notation). Show by each of the following methods that this group has 24 elements. 


(a) Identify the distinct relevant axes and count the number of qualifying rota- 
tions about each. 

(b) The orientation of the cube is determined if the directions of two of its body 
diagonals are given. Consider the number of distinct ways in which one body 
diagonal can be chosen to be ‘vertical’ and a second diagonal made to he 
along a particular direction. 


Identify the eight symmetry operations on a square. Show that they form a group 
(known to crystallographers as 4 mm or to chemists as C^ v ) having one element 
of order 1, five of order 2 and two of order 4. Find its proper subgroups and the 
corresponding cosets. 
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24.12 


24.13 


24.14 


24.15 


24.16 


24.17 


24.18 


If A and B are two groups then their direct product, A x B, is defined to be 
the set of ordered pairs (X, Y), with X an element of A, Y an element of B 
and multiplication given by (X, Y )(X', Y') = (XX', Y Y'). Prove that Ax B is a 
group. 

Denote the cyclic group of order n by C„ and the symmetry group of a regular 
fj-sided figure (an n-gon) by V n - thus V 3 is the symmetry group of an equilateral 
triangle, as discussed in the text. 

(a) By considering the orders of each of their elements, show (i) that C? X C3 is 
isomorphic to Ce, and (ii) that C 2 x V 3 is isomorphic to V 6 . 

(b) Are any of X> 4 , Cs, C 2 x C 4 , C 2 x C 2 x C 2 isomorphic? 

Find the group Q generated under matrix multiplication by the matrices 

A -(? 1)- B -(i 0 

Determine its proper subgroups, and verify for each of them that its cosets 
exhaust Q. 

Show that if p is prime then the set of rational number pairs (a, b), excluding 
(0,0), with multiplication defined by 

(a,b) • (c,d) = (e,f), where (n + bjp)(c + djp) = e + f jp, 

forms an Abelian group. Show further that the mapping (a, b) — > (a,—b) is an 
automorphism. 

(a) Denote by A n the subset of the permutation group S n that contains all the 
even permutations. Show that A„ is a subgroup of S n . 

(b) List the elements of S3 in cycle notation and identify the subgroup A 3 . 

(c) For each element X of S3, let p(X) = 1 if X belongs to A 3 and p(X) = —1 if it 
does not. Denote by C2 the multiplicative cyclic group of order 2. Determine 
the images of each of the elements of S3 for the following four mappings: 


®i : S3 — + C2 

X - P(X) 

®2 : S3 — > C2 

X -► -pi*) 

t 

CO 

© 


^4 ! S3 — * S3 

X -► X 3 


(d) For each mapping, determine whether the kernel K, is a subgroup of S3 and, 
if so, whether the mapping is a homomorphism. 

For the group Q with multiplication table 24.8 and proper subgroup TL = {I, A, B}, 
denote the coset {I, A, B) by Ci and the coset {C,D,E} by C2. Form the set of 
all possible products of a member of C 1 with itself, and denote this by C1C1. 
Similarly compute C2C2, C 1 C 2 and C2C1. Show that each product coset is equal to 
Ci or to C 2 and that a 2 x 2 multiplication table can be formed demonstrating 
that Ci and C2 are themselves the elements of a group of order 2. A subgroup 
like TL whose cosets themselves form a group is a normal subgroup. 

The group of all non-singular n x n matrices is known as the general linear 
group GL(n) and that with only real elements as GL(n,R). If R* denotes the 
multiplicative group of non-zero real numbers, prove that the mapping <D : 
GL(n, R) ^ R*, defined by <D(M) = det M, is a homomorphism. 

Show that the kernel K, of fl> is a subgroup of GL(n,R). Determine its cosets 
and show that they themselves form a group. 

The group of reflection-rotation symmetries of a square is known as T> 4 ; let 
X be one of its elements. Consider a mapping ® : 2? 4 — > S 4 , the permutation 
group on four objects, defined by ®(X) = the permutation induced by X on 
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the set {x, y, d, d'}, where x and y are the two principal axes and d and d' 
the two principal diagonals, of the square. For example, if R is a rotation by 
7 i/ 2 , ®(R) = (12)(34). Show that V 4 is mapped onto a subgroup of S 4 and, by 
constructing the multiplication tables for X > 4 and the subgroup, prove that the 
mapping is a homomorphism. 

24.19 Given that matrix M is a member of the multiplicative group GL(3,R), determine, 
for each of the following additional constraints on M (applied separately), whether 
the subset satisfying the constraint is a subgroup of GL(3,R): 

(a) M r = M ; 

(b) M r M = I; 

(c) I M | = 1 ; 

(d) Mjj = 0 for j > i and M„ ^ 0. 

24.20 In the quaternion group Q the elements form the set 

{1,-1, —j, k, -k}, 

with i 2 = j 2 = k 2 = — 1 , ij = k and its cyclic permutations, and ji = —k and 
its cyclic permutations. Find the proper subgroups of Q and the corresponding 
cosets. Show that the subgroup of order 2 is a normal subgroup, but that the 
other subgroups are not. Show that Q cannot be isomorphic to the group 4 mm 
(C 4 ,,) considered in exercise 24.11. 

24.21 Show that X> 4 , the group of symmetries of a square, has two isomorphic subgroups 
of order 4. Show further that there exists a two-to-one homomorphism from the 
quaternion group 3 of exercise 24.20 onto one (and hence either) of these two 
subgroups, and determine its kernel. 

24.22 Show that the matrices 

( cos 9 — sin 9 x \ 

sinfl cos 9 y I , 

0 0 1 J 

where 0 < 9 < 2n, —00 < x < 00 , —00 < y < 00 , form a group under matrix 
multiplication. 

Show thatthose M for which 9 = 0 form a subgroup and identify its cosets. 
Show that the cosets themselves form a group. 

24.23 Find (a) all the proper subgroups and (b) all the conjugacy classes of the symmetry 
group of a regular pentagon. 


24.9 Hints and answers 

24.1 f (a) Yes. (b) no, no inverse for 2. (c) yes. (d) no, 2 x 3 is not in the set. (e) yes. 

(f) yes, they form a subgroup of order 4, [1,0;0, 1] [4,0;0,4] [1,2;0,4] [4, 3 ; 0, 1 ] . 

(g) yes. 

24.2 (a) No, not reflexive; (b) yes, partition of integers into odd and even; (c) yes. (d) 
no, not transitive, X — > Y — > Z if Y’s parents both re-marry and X and Z are 
children of the two second marriages; (e) yes; (f) yes. 

24.3 x»(y • z) = x + y + z + r(xy + xz + yz) + r 2 xyz = (.x • y) • z. Show that assuming 
x»y = — r _1 leads to (rx + 1 )(ry + 1) = 0. The inverse of x is x -1 = — x/(l +rx); 
show that this is not equal to — r _1 . 


f Where matrix elements are given as a list, the convention used is [row l;row 2;...], individual 
entries in each row being separated by commas. 
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24.4 The relevant sets of values for [a,b,c, d] are [1,0,0, 1], [— d,b, c, — a] and [a'a + 
b’c, a'b + b'd, c'a + d'c, c'b + d'd] for reflexivity, symmetry and transitivity respec- 
tively; the rational numbers. 

24.5 (a) Consider both X = i and X ^ i. Here, i ^ i. (b) In this case i ~ i, but the 
conclusion cannot be deduced from the other axioms. In both cases i is in a class 
by itself and no Y, as used in the false proof, can be found. 

24.6 f Matrices [1,3; 0,1] and [2, 3; 0,1] do not commute, so the group is non- 
Abelian. (a) 12 elements in the set {[1, 0; 0, 1] [4,0;0,4] [l,fc;0,4] [4,/?;0, 1] with 
b arbitrary}. The full group has order 4 x 4 x 5 = 80, which is not divisible by 
12. (b) [1,0; 0,4] [1,3; 0,4] = [1, 3 ; 0, 1], which has order > 2. 

24.7 f Use \AB\ = \A\\B\ = 1 x 1 = 1 to prove closure. The inverse has w <-> z, 
x <-» —x, y <-> —y, giving \A^\ = 1, i.e. it is in the set. The only element of order 
2 is — /; A 2 can be simplified to [— (w + 1), — x;— y, — (z + 1)]. 

24.8 Note that if each matrix is written in the form N = (m,—n 2 ',n 2 ,ri[) with |»i| 2 + 
\n 2 \ 2 = 1 then NQ = P, where p 3 = niqi — n 2 q 2 and p 2 = n 2 qi + n\q 2 with 
|pij 2 + \pi | 2 = 1. The inverse of M(no,a) is M(uo,— a)- Show that M* r M = I. 

24.9 If XY = Z, show that Y = XZ and X = ZY, then form YX. Note that the 
elements of B can only have orders 1, 2 or p. Suppose they all have order 1 or 
2; then using the earlier result, whilst noting that 4 does not divide 2 p, leads to 
a contradiction. 

24.10 (a) Identity = 1, three rotations of n about face normals, six rotations of +n/2 
about face normals, six rotations of n about edge diagonals, eight rotations of 
+2n/3 about body diagonals, (b) The ‘vertical' diagonal can be chosen in 4 x 2 
ways (either end of each diagonal can be ‘up’). There are then three equivalent 
rotational positions about the vertical and thus 4x2x3 possibilities altogether. 

24.11 Using the notation indicated in figure 24.3, R being a rotation of 7i/2 about an 
axis perpendicular to the square, we have: / has order 1; R 2 , mi, m 2 , m 3 , m 4 have 
order 2; R, R 3 have order 4. 


m i(k) 







}fl2 (7r) 





m 3 (7t) 

■ m 4 ( 7i) 


Figure 24.3 The notation for exercise 24.11. 

Subgroup {/,R, R 2 ,R 3 } has cosets {/,R, R 2 ,R 3 }, {mi, m 2 , m 3 , m 4 }; 
subgroup {I,R 2 } has cosets {/,R 2 }, {R, R 3 }, {m 1; m 2 }, {m 3 ,m 4 }; 
subgroup {I, mi} has cosets {I, mi}, {R,m 3 }, {R 2 ,m 2 }, {R 3 ,m 4 }; 
subgroup {I,m 2 } has cosets {I,m 2 }, {R,m 4 }, {R 2 ,mi}, {R 3 ,m 3 }; 
subgroup {/,m 3 } has cosets {/,m 3 }, {R,m 2 }, {R 2 ,m 4 }, {R 3 ,mi}; 
subgroup {/,m 4 } has cosets {/,m 4 }, {R,mi}, {R 2 ,m 3 }, {R 3 ,m 2 }. 

24.12 (a) (i) Each has one element of order 1, one element of order 2 two elements of 

order 3 and two elements of order 6. 

(ii) Each has one element of order 1, seven of order 2, two of order 3 and 
two elements of order 6. 
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(b) No. Cg contains elements of order 8 ; none of the others could. Every element 
of C2XC2X C2 is of order 1 or 2; the remaining two groups must each contain 
an element of order 4. V 4 has one, five and two elements of order 1, 2 and 
4 respectively; C 2 x C4 has correspondingly one, three and four elements. 

24.13 Q = {I, A, B, B 2 , B 3 , AB, AB 2 , AB 3 }. The proper subgroups are as follows: 

{I, A}; {I, B 2 }, {I, AB 2 }, {l,B,B 2 ,B 3 }, {I, B 2 , AB, AB 3 }. 

24.14 (<a, £>) 1 = (a 2 — pb 2 )^ 1 (a,—b), which has rational entries with a 2 pb 2 since p is 

prime. 

24.15 (b) A 3 = {(1), (123), (132)}. 

(d) For ®i, K, = {(1), (123), ( 132)} is a subgroup. 

For ® 2 , K, = {(23), (13), (12)} is not a subgroup because it has no identity element. 
For ®3, K, = {( 1), (23), (13), (12)} is not a subgroup because it is not closed. 

For ® 4 , K, = {( 1 ), ( 123), (132)} is a subgroup. 

Only ®i is a homomorphism; ® 4 fails because, for example, [(23)(13)]' ^ 
(23)'(13)\ 

24.16 C1C1 = C 2 C 2 = Ci, C 1 C 2 = C 2 C 1 = C 2 . 

24.17 Recall that, for any pair of matrices P and Q, |PQ| = |P||Q|. Jf is the set of all 
matrices with unit determinant. The cosets of K, are the sets of matrices whose 
determinants are equal; K, itself is the identity in the group of cosets. 

24.18 I,R 2 — > (1); R,R 3 — ► (12)(34); m x ,m y — > (34); — > (12). The multiplication 

table for the subgroup is that given in table 24.3. 

24.19 (a) No, because the set is not closed, (b) yes. (c) yes. (d) yes. 

24.20 The subgroup {1,-1} has cosets C t = {1,-1}, C,- = {i, — i}, Cj = {),—)}, 

Ck = {k,—k}. The subgroup {1, i, — 1, — 1} has cosets T>, = {1, i, — 1, — i}, T>\ = 
{j,—j,k,—k}; corresponding pairs of cosets Dj, Df and V k , T> k i are obtained 
from subgroups and {1,/c,— 1,— k} respectively. They can be written 

down by cyclically permuting i,j,k in T>„ T>[. The cosets of {1,-1} form a group 
with Ci as the identity and C,C ; - = C k etc. The cosets of { 1, 1, — 1, — f} do not form 
a group since, for example, the product involves all elements of Q. It is 
sufficient to notice that 4 mm has six elements of order 2, whilst Q has only two. 

24.21 Each subgroup contains the identity, a rotation by n, and two reflections. The 
homomorphism is +1 — > I, +i — > R 2 , +j — > m x , +k — > m,, with kernel {1,-1}. 

24.22 Closure is shown by M(0,x,y)M(<j),x',y') = M(0 + cj),X, Y), 
where X = x + x' cos 9 — y' sin 9 and Y = y + y' cos 9 + x' sin 9. 

The inverse is given by M (9,x,y)~ l = M(— 9, — xco&9 — y sin0,xsin 9 — _y cos 9). 
All members of any coset Co have the same value for 9. 

C 0l x Co 2 = Co 1 + g 2(mo d 2 „). The inverse coset is Cg 1 = C 2 „„ 9 . 

24.23 There are 10 elements: /, rotations R'(i = 1,4) and reflections nij(j = 1,5). 

(a) Five proper subgroups of order 2, {/,mT and one of order 5, {/,R, R 2 ,R 3 ,R 4 }. 

(b) Four conjugacy classes, {/}, {R,R 4 }, {R 2 ,R 3 }, {mi,m 2 ,m3,m4,m5}. 
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Representation theory 


As indicated at the start of the previous chapter, significant conclusions can 
often be drawn about a physical system simply from the study of its symmetry 
properties. That chapter was devoted to setting up a formal mathematical basis, 
group theory, with which to describe and classify such properties; the current 
chapter shows how to implement the consequences of the resulting classifications 
and obtain concrete physical conclusions about the system under study. The 
connection between the two chapters is akin to that between working with 
coordinate-free vectors, each denoted by a single symbol, and working with a 
coordinate system in which the same vectors are expressed in terms of components. 

The ‘coordinate systems’ that we will choose will be ones that are expressed in 
terms of matrices; it will be clear that ordinary numbers would not be sufficient, 
as they make no provision for any non-commutation amongst the elements 
of a group. Thus, in this chapter the group elements will be represented by 
matrices that have the same commutation relations as the members of the group, 
whatever the group’s original nature (symmetry operations, functional forms, 
matrices, permutations, etc.). For some abstract groups it is difficult to give a 
written description of the elements and their properties without recourse to such 
representations. Most of our applications will be concerned with representations 
of the groups that consist of the symmetry operations on molecules containing 
two or more identical atoms. 

Firstly, in section 25.1, we use an elementary example to demonstrate the kind 
of conclusions that can be reached by arguing purely on symmetry grounds. Then 
in sections 25.2-25.10 we develop the formal side of representation theory and 
establish general procedures and results. Finally, these are used in section 25.11 
to tackle a variety of problems drawn from across the physical sciences. 
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O 

(a) HC1 


o o • o 

'A! 

(b) CO, 



Figure 25.1 Three molecules, (a) hydrogen chloride, ( b ) carbon dioxide and 
(c) ozone, for which symmetry considerations impose varying degrees of 
constraint on their possible electric dipole moments. 


25.1 Dipole moments of molecules 

Some simple consequences of symmetry can be demonstrated by considering 
whether a permanent electric dipole moment can exist in any particular molecule; 
three simple molecules, hydrogen chloride, carbon dioxide and ozone, are illus- 
trated in figure 25.1. Even if a molecule is electrically neutral, an electric dipole 
moment will exist in it if the centres of gravity of the positive charges (due to 
protons in the atomic nuclei) and of the negative charges (due to the electrons) 
do not coincide. 

For hydrogen chloride there is no reason why they should coincide; indeed, the 
normal picture of the binding mechanism in this molecule is that the electron from 
the hydrogen atom moves its average position from that of its proton nucleus to 
somewhere between the hydrogen and chlorine nuclei. There is no compensating 
movement of positive charge, and a net dipole moment is to be expected - and 
is found experimentally. 

For the linear molecule carbon dioxide it seems obvious that it cannot have 
a dipole moment, because of its symmetry. Putting this rather more rigorously, 
we note that any rotation about the long axis of the molecule leaves it totally 
unchanged; consequently, any component of a permanent electric dipole perpen- 
dicular to that axis must be zero (a non-zero component would rotate although 
no physical change had taken place in the molecule). That only leaves the pos- 
sibility of a component parallel to the axis. However, a rotation of n radians 
about the axis A A' shown in figure 25.1(h) carries the molecule into itself, as 
does a reflection in a plane through the carbon atom and perpendicular to the 
molecular axis (i.e. one with its normal parallel to the axis). In both cases the two 
oxygen atoms change places but, as they are identical, the molecule is indistin- 
guishable from the original. Either ‘symmetry operation’ would reverse the sign 
of any dipole component directed parallel to the molecular axis; this can only be 
compatible with the indistinguishability of the original and final systems if the 
parallel component is zero. Thus on symmetry grounds carbon dioxide cannot 
have a permanent electric dipole moment. 

Finally, for ozone, which is angular rather than linear, symmetry does not 
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place such tight constraints. A dipole-moment component parallel to the axis 
BB' (figure 25.1(c)) is possible, since there is no symmetry operation that reverses 
the component in that direction and at the same time carries the molecule into 
an indistinguishable copy of itself. However, a dipole moment perpendicular to 
BB' is not possible, since a rotation of n about BB' would both reverse any 
such component and carry the ozone molecule into itself - two contradictory 
conclusions unless the component is zero. 

In summary, symmetry requirements appear in the form that some or all 
components of permanent electric dipoles in molecules are forbidden; they do 
not show that the other components do exist, only that they may. The greater 
the symmetry of the molecule, the tighter the restrictions on potentially non-zero 
components of its dipole moment. 

In section 23.11 other, more complicated, physical situations will be analysed 
using results derived from representation theory. In anticipation of these results, 
and since it may help the reader to understand where the developments in the 
next nine sections are leading, we make here a broad, powerful, but rather formal, 
statement as follows. 

If a physical system is such that after the application of particular rotations or 
reflections (or a combination of the two ) the final system is indistinguishable from 
the original system then its behaviour, and hence the functions that describe its 
behaviour, must have the corresponding property of invariance when subjected to 
the same rotations and reflections. 


25.2 Choosing an appropriate formalism 

As mentioned in the introduction to this chapter, the elements of a finite group 
Q can be represented by matrices; this is done in the following way. A suitable 
column matrix u, known as a basis vector f is chosen and is written in terms of 
its components u„ the basis functions, as u = (u i 112 ••• u„) T . The w, may be of 
a variety of natures, e.g. numbers, coordinates, functions or even a set of labels, 
though for any one basis vector they will all be of the same kind. 

Once chosen, the basis vector can be used to generate an n-dimensional rep- 
resentation of the group as follows. An element X of the group is selected and 
its effect on each basis function m,- is determined. If the action of X on ii\ is to 
produce u\, etc. then the set of equations 

u'j = Xiii (25.1) 

generates a new column matrix u' = (if u' 2 ■ ■ ■ u' n ) T . Having established u and u' 


■( This usage of the term basis vector is not exactly the same as that introduced in subsection 8.1.1. 
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we can determine the n x n matrix, M(X) say, that connects them by 

u' = M(X)u. (25.2) 

It may seem natural to use the matrix M(X) so generated as the representative 
matrix of the element X; in fact, because because we have already chosen the 
convention whereby Z = XY implies that the effect of applying element Z is the 
same as that of first applying Y and then applying X to the result, one further 
step has to be taken. So that the representative matrices D(2T) may follow the 
same convention, i.e. 


D(Z) = D(A-)D(n 

and at the same time respect the normal rules of matrix multiplication, it is 
necessary to take the transpose of M(X) as the representative matrix D(X). 
Explicitly, 


D(A) = M t (X) (25.3) 

and (25.2) becomes 

u' = D t (X)u. (25.4) 

Thus the procedure for determining the matrix D(X) that represents the group 
element X in a representation based on basis vector u is summarised by equations 
(25.1)-(25.4).t 

This procedure is then repeated for each element X of the group, and the 
resulting set of n x n matrices D = (D(X)} is said to be the n-dimensional 
representation of Q having u as its basis. The need to take the transpose of each 
matrix M(X) is not of any fundamental significance, since the only thing that 
really matters is whether the matrices D(X) have the appropriate multiplication 
properties - and, as defined, they do. 

In cases in which the basis functions are labels, the actions of the group 
elements are such as to cause rearrangements of the labels. Correspondingly the 
matrices D(X) contain only Ts and ‘O’s as entries; each row and each column 
contains a single T. 


t An alternative procedure in which a row vector is used as the basis vector is possible. Defining 
equations of the form u T X = u T D(A') are used, and no additional transpositions are needed to 
define the representative matrices. However, row-matrix equations are cumbersome to write out 
and in all other parts of this book we have conventionally written operators (here the group 
element) to the left of the object on which they operate (here the basis vector). 
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►For the group S 3 of permutations on three objects, which has group multiplication ta- 
ble 24.8 on p. 897 , with ( in cycle notation ) 

I = (1)(2)(3), A = (123), B = (132 
C = (1)(23), D = (3)(1 2), E = (2)(1 3), 

use as the components of a basis vector the ordered letter triplets 

Ul = {PQR}, u 2 = {QRP}, h 3 = {RPQ}, 

14 = {PRQ}, m 5 = {QPR}, m 6 = {RQP}. 

Generate a six-dimensional representation D = {D(X)} of the group and confirm that the 
representative matrices multiply according to table 24.8, e.g. 

D(C)D(B) = D(£). 

It is immediate that the identity permutation / = (1)(2)(3) leaves all u, unchanged, i.e. 
u- = Uj for all i. The representative matrix D(/ ) is thus l«, the 6 x 6 unit matrix. 

We next take X as the permutation A = (12 3) and, using (25.1), let it act on each of 
the components of the basis vector: 

u\ = Au\ = (12 3){PQR} = {QRP} = u 2 
u ' 2 = Am = (1 2 3){Q R P} = {R P Q} = u 3 


u ' 6 = Au 6 = (1 2 3){R Q P} = {Q P R} = u 5 . 

The matrix M(H) has to be such that u' = M(A)u (here dots replace zeroes to aid 
readability) : 

■ 1 ■ ■ ■ ■ \ / Uj 

1 U2 

1 U 3 

1 U 4 

1 Us 

■ ■ ■ ■ 1 ■ / V U 6 

D(H) is then equal to M T (H). 

The other D(X) are determined in a similar way. In general, if 

Xiij = uj, 

then [M(A)] l; - = 1, leading to [D(X )] ;7 = 1 and [D(A)] jJ( = 0 for k fi i. For example, 

C«3 = (1)(23){RPQ} = {RQP} = m 6 

implies that [D(C )] 63 = 1 and [D(C)] 6Jc = 0 for k = 1,2, 4, 5, 6 . When calculated in full 

/ 1 \ /•!••• 

. . . . 1 . . . 1 . . 

uto- , ouu 1 ; ; ; ; 

. i . . . . . . . i . 

V • • i ■• • / V • • • • 1 
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P 



Q 


P 



Q 


P 



Q 


Figure 25.2 Diagram (a) shows the definition of the basis vector, (b) shows 
the effect of applying a clockwise rotation of 2n/3 and (c) shows the effect of 
applying a reflection in the mirror axis through Q. 


0(E) = 


( 1 \ 

. . . 1 . . 

. . . . i . 

. 1 . . . . 

. . 1 . . . 

V i 


from which it can be verified that D(C)D(B) = D(£). ◄ 


Whilst a representation obtained in this way necessarily has the same dimension 
as the order of the group it represents, there are, in general, square matrices of 
both smaller and larger dimensions that can be used to represent the group, 
though their existence may be less obvious. 

One possibility that arises when the group elements are symmetry opera- 
tions on an object whose position and orientation can be referred to a space 
coordinate system is called the natural representation. In it the representative 
matrices D(2T) describe, in terms of a fixed coordinate system, what happens 
to a coordinate system that moves with the object when X is applied. There 
is usually some redundancy of the coordinates used in this type of represen- 
tation, since interparticle distances are fixed and fewer than 3 N coordinates, 
where N is the number of identical particles, are needed to specify uniquely 
the object’s position and orientation. Subsection 25.11.1 gives an example that 
illustrates both the advantages and disadvantages of the natural representation. 
We continue here with an example of a natural representation that has no such 
redundancy. 


► Use the fact that the group considered in the previous worked example is isomorphic to 
the group of two-dimensional symmetry operations on an equilateral triangle to generate a 
three-dimensional representation of the group. 


Label the triangle’s corners as 1, 2, 3 and three fixed points in space as R Q, R, so that 
initially comer 1 lies at point P, 2 lies at point Q, and 3 at point R. We take P, Q, R as 
the components of the basis vector. 

In figure 25.2, (n) shows the initial configuration and also, formally, the result of applying 
the identity / to the triangle; it is therefore described by the basis vector, (P Q R) T . 
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Diagram (b) shows the the effect of a clockwise rotation by 2n/3, corresponding to 
element A in the previous example; the new column matrix is (Q R P) T . 

Diagram (c) shows the effect of a typical mirror reflection - the one that leaves the 
comer at point Q unchanged (element D in table 24.8 and the previous example); the new 
column matrix is now (R Q P) T . 

In similar fashion it can be concluded that the column matrix corresponding to element 
B, rotation by 47t/3, is (R P Q) T , and that the other two reflections C and E result in 
column matrices (P R Q) T and(Q P R) T respectively. The forms of the representative 
matrices M nat (X), (25.2), are now determined by equations such as, for element E, 


implying that 


Q 

P 

R 


0 1 0 \ / P \ 

1 0 0 Q 

0 0 1 J \ R ) 


D nat (£) 


0 

1 

0 


1 

0 

0 



0 1 0 \ 
10 0 
0 0 1 


In this way the complete representation is obtained as 


/ 1 0 0 
D nat (/)= 0 10 

\0 0 1 


/ 0 0 1 
D nat (H) =10 0 

y 0 1 0 


D nat (5) = 


0 1 0 \ 

0 0 1 , 
1 0 0 ) 


( 1 0 0 

D nat (C) =001 

y o 1 o 


D nat (D) = 


0 0 1 \ 

0 10, 
1 0 0 ) 


D nat (£) = 


0 1 0 \ 
10 0 
0 0 1 ) 


It should be emphasised that although the group contains six elements this representation 
is three-dimensional. ◄ 


We will concentrate on matrix representations of finite groups, particularly 
rotation and reflection groups (the so-called crystal point groups). The general 
ideas carry over to infinite groups, such as the continuous rotation groups, but in 
a book such as this, which aims to cover many areas of applicable mathematics, 
some topics can only be mentioned and not explored. We now give the formal 
definition of a representation. 

Definition. A representation D = {D(X)} of a group Q is an assignment of a non- 
singular square n x n matrix DPT) to each element X belonging to Q, such that 

(i) D(/) = l„, the unit n x n matrix, 

(ii) D(X)D(Y) = D (XY) for any two elements X and Y belonging to Q, i.e. the 
matrices multiply in the same way as the group elements they represent. 

As mentioned previously, a representation by n x n matrices is said to be an 
n-dimensional representation of Q. The dimension n is not to be confused with 
g, the order of the group, which gives the number of matrices needed in the 
representation, though they might not all be different. 

A consequence of the two defining conditions for a representation is that the 
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matrix associated with the inverse of X is the inverse of the matrix associated 
with X. This follows immediately from setting Y = X -1 in (ii) : 

DPODfX- 1 ) = Dfir 1 ) = D(/) = l„; 


hence 


DtX- 1 ) = [DfXr 1 • 


As an example, the four-element Abelian group that consists of the set {1, i, — 1, — i} 
under ordinary multiplication has a two-dimensional representation based on the 
column matrix (1 i) T : 


°(i) = ( J ° ) ■ m 

d (-d - ( ) , D,-,) 


0 -1 
1 0 
0 1 
-1 0 


The reader should check that D(z)D( — i) = D(l), D(i)D(z) = D(— 1) etc., i.e. that 
the matrices do have exactly the same multiplication properties as the elements 
of the group. Having done so, the reader may also wonder why anybody would 
bother with the representative matrices, when the original elements are so much 
simpler to handle! As we will see later, once some general properties of matrix 
representations have been established, the analysis of large groups, both Abelian 
and non-Abelian, can be reduced to routine, almost cookbook, procedures. 

An H-dimensional representation of Q is a homomorphism of Q into the set of 
invertible n x n matrices (i.e. n x n matrices that have inverses or, equivalently, 
have non-zero determinants) ; this set is usually known as the general linear group 
and denoted by GL(n). In general the same matrix may represent more than one 
element of Q\ if, however, all the matrices representing the elements of Q are 
different then the representation is said to be faithful, and the homomorphism 
becomes an isomorphism onto a subgroup of GL(n). 

A trivial but important representation is D(X) = l„ for all elements X of Q. 
Clearly both of the defining relationships are satisfied, and there is no restriction 
on the value of n. However, such a representation is not a faithful one. 

To sum up, in the context of a rotation-reflection group, the transposes of 
the set of n x n matrices D(X) that make up a representation D may be thought 
of as describing what happens to an n-component basis vector of coordinates, 
(x y • • • ) T , or of functions, (TG TC • • • ) T , the 'F,- themselves being functions 
of coordinates, when the group operation X is carried out on each of the 
coordinates or functions. For example, to return to the symmetry operations 
on an equilateral triangle, the clockwise rotation by 2n/3, R , carries the three- 
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dimensional basis vector (x y z) T into the column matrix 

( — \ x +4y \ 

- y 

v z 

whilst the two-dimensional basis vector of functions (r 2 3 z 2 — r 2 ) T is unaltered, 

as neither r nor z is changed by the rotation. The fact that z is unchanged by 
any of the operations of the group shows that the components x, y, z actually 
divide (i.e. are ‘reducible’, to anticipate a more formal description) into two sets: 
one comprises z, which is unchanged by any of the operations, and the other 
comprises x, y, which change as a pair into linear combinations of themselves. 
This is an important observation to which we return in section 25.4. 


25.3 Equivalent representations 

If D is an n-dimensional representation of a group Q, and Q is any fixed invert- 
ible n x n matrix (|Q| 0), then the set of matrices defined by the similarity 

transformation 


D q (X) = Q _1 D(X )Q (25.5) 

also forms a representation D Q of Q, said to be equivalent to D. We can see from a 
comparison with the definition in section 25.2 that they do form a representation : 

(i) D q (/) = Q- 1 D(/)Q = Q- 1 I„Q = I, ! , 

(ii) Dq(X)D q (7) = Q- 1 D(W)QQ- 1 D(7)Q = Q- 1 D(W)D(7)Q 

= Q-‘D(17)Q = D q (X7). 

Since we can always transform between equivalent representations using a non- 
singular matrix Q, we will consider such representations to be one and the same. 

Despite the similarity of words and manipulations to those of subsection 24.7.1, 
that two representations are equivalent does not constitute an ‘equivalence re- 
lation’ - for example, the reflexive property does not hold for a general fixed 
matrix Q. However, if Q were not fixed, but simply restricted to belonging to 
a set of matrices that themselves form a group, then (25.5) would constitute an 
equivalence relation. 

The general invertible matrix Q that appears in the definition (25.5) of equiv- 
alent matrices describes changes arising from a change in the coordinate system 
(i.e. in the set of basis functions). As before, suppose that the effect of an opera- 
tion X on the basis functions is expressed by the action of M(X) (which is equal 
to D T (2f)) on the corresponding basis vector: 

u' = M(X)u = D T (X)u. (25.6) 
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A change of basis would be given by uq = Qu and u' Q = Qu', and we may write 
Uq = Qu' = QM(l)u = QD t (A)Q- 1 u q . (25.7) 


This is of the same form as (25.6), i.e. 

Uq = D t q t(X)u q , (25.8) 

where D q t(X) = (Q T ) -1 D(.X')Q T is related to D(X) by a similarity transforma- 
tion. Thus Dqt(X) represents the same linear transformation as D(X), but with 
respect to a new basis vector u Q ; this supports our contention that representa- 
tions connected by similarity transformations should be considered as the same 
representation. 


► For the four-element Abelian group consisting of the set {1, i, — 1, — i} under ordinary 
multiplication, discussed near the end of section 25.2, change the basis vector from u = 
(1 i) T to Uq = (3 — i 2 i— 5) T . Find the real transformation matrix Q. Show that the 
transformed representative matrix for element i, D q t(i), is given by 


DqT (f) = 


17 -29 \ 

10 -17 ) 


and verify that Dq T (i)Uq = iuq. 


Firstly, we solve the matrix equation 


3 — i 
2i -5 


a b 
c d 


with a, b, c, d real. This gives Q and hence Q 1 as 


Q = 


3 -1 

-5 2 


CT 1 = 


2 1 
5 3 


Following (25.7) we now find the transpose of D q t(/) as 
QD T (i)Q- 1 = 


3 -1 

-5 2 


0 1 

-1 0 


2 1 
5 3 


and hence D q t(i) is as stated. Finally, 


17 10 

-29 -17 


D T q t(')uq = 


17 10 

-29 -17 


3 — i 
2i -5 


1 + 3i 
-2 - 5 i 


3 — i 
2i-5 


= 'Uq, 


as required. ◄ 

Although we will not prove it, it can be shown that any finite representation 
of a finite group of linear transformations that preserve spatial length (or, in 
quantum mechanics, preserve the magnitude of a wavefunction) is equivalent to 
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a representation in which all the matrices are unitary (see chapter 8) and so from 
now on we will consider only unitary representations. 


25.4 Reducibility of a representation 

We have seen already that it is possible to have more than one representation 
of any particular group. For example, the group {1, i, — 1,— i} under ordinary 
multiplication has been shown to have a set of 2 x 2 matrices, and a set of four 
unit n x n matrices l,„ as two of its possible representations. 

Consider two or more representations, D (1) , D (2) , . . . , D (iV) , which may be 
of different dimensions, of a group Q. Now combine the matrices D (1) (X), 
D |2| (X), that correspond to element X of Q into a larger block- 

diagonal matrix: 


D(W) 


D (1) (X) 



D (2, (W) 


V 0 


0 \ 



(25.9) 


Then D = {D(2T)} is the matrix representation of the group obtained by combining 
the basis vectors of D (1) , D <2) , . . . , D (JV) into one larger basis vector. If, knowingly or 
unknowingly, we had started with this larger basis vector and found the matrices 
of the representation D to have the form shown in (25.9), or to have a form 
that can be transformed into this by a similarity transformation (25.5) (using, 
of course, the same matrix Q for each of the matrices D(2f)) then we would say 
that D is reducible and that each matrix D(2f) can be written as the direct sum of 
smaller representations : 

D(X) = D (1, (X) © D (2) (X) © • • • © D (N) (^)- 

It may be that some or all of the matrices D (1) (2f), D <2) (X), . . . , D ,JV) themselves 
can be further reduced - i.e. written in block diagonal form. For example, 
suppose that the representation D (1) , say, has a basis vector (x y z) T ; then, for 
the symmetry group of an equilateral triangle, whilst x and y are mixed together 
for at least one of the operations X, z is never changed. In this case the 3x3 
representative matrix D (1, (2T) can itself be written in block diagonal form as a 
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2x2 matrix and a 1 x 1 matrix. The direct-sum matrix D(X) can now be written 



1 a h 


C d 


; 1 i 

D(X) = 

D (2 \x) 


0 \ 


(25.10) 


0 


D (Ar, (2f) 




but the first two blocks can be reduced no further. 

When all the other representations D (2, (X), ... have been similarly treated, 
what remains is said to be irreducible and has the characteristic of being block 
diagonal, with blocks that individually cannot be reduced further. The blocks are 

known as the irreducible representations of Q, often abbreviated to the irreps of 

A (0 

Q , and we denote them by D . They form the building blocks of representation 
theory, and it is their properties that are used to analyse any given physical 
situation which is invariant under the operations that form the elements of Q. 
Any representation can be written as a linear combination of irreps. 

If, however, the initial choice u of basis vector for the representation D is 
arbitrary, as it is in general, then it is unlikely that the matrices D(2T) will 
assume obviously block diagonal forms (it should be noted, though, that since 
the matrices are square, even a matrix with non-zero entries only in the extreme 
top right and bottom left positions is technically block diagonal). In general, it 
will be possible to reduce them to block diagonal matrices with more than one 
block; this reduction corresponds to a transformation Q to a new basis vector 
uq, as described in section 25.3. 

In any particular representation D, each constituent irrep D (,) may appear any 
number of times, or not at all, subject to the obvious restriction that the sum of 
all the irrep dimensions must add up to the dimension of D itself. Let us say that 
D (,) appears m, times. The general expansion of D is then written 

Ml) a (2) A(J V) 

D = m i D ® niiD ® • • ■ © wijv D , (25.11) 

where if Q is finite so is N. 

This is such an important result that we shall now restate the situation in 
somewhat different language. When the set of matrices that forms a representation 
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of a particular group of symmetry operations has been brought to irreducible 
form, the implications are as follows. 

(i) Those components of the basis vector that correspond to rows in the 
representation matrices with a single-entry block, i.e. a 1 x 1 block, are 
unchanged by the operations of the group. Such a coordinate or function 
is said to transform according to a one-dimensional irrep of Q. In the 
example given in (25.10), that the entry on the third row forms a 1 x 1 
block implies that the third entry in the basis vector (x y z • • • ) T , 
namely z, is invariant under the two-dimensional symmetry operations on 
an equilateral triangle in the xy-plane. 

(ii) If, in any of the g matrices of the representation, the largest-sized block 
located on the row or column corresponding to a particular coordinate 
(or function) in the basis vector is n x n, then that coordinate (or function) 
is mixed by the symmetry operations with n — 1 others and is said to 
transform according to an n-dimensional irrep of Q. Thus in the matrix 
(25.10), x is the first entry in the complete basis vector; the first row of 
the matrix contains two non-zero entries, as does the first column, and so 
x is part of a two-component basis vector whose components are mixed 
by the symmetry operations of Q. The other component is y. 

The result (25.11) may also be formulated in terms of the more abstract notion 
of vector spaces (chapter 8). The set of g matrices that forms an n-dimensional 
representation D of the group Q can be thought of as acting on column matrices 
corresponding to vectors in an n-dimensional vector space V spanned by the basis 
functions of the representation. If there exists a proper subspace W of V, such 
that if a vector whose column matrix is w belongs to W then the vector whose 
column matrix is D(2f)w also belongs to W, for all X belonging to Q , then it 
follows that D is reducible. We say that the subspace W is invariant under the 
actions of the elements of Q. With D unitary, the orthogonal complement W± of 
W, i.e. the vector space V remaining when the subspace W has been removed, is 
also invariant, and all the matrices D(X) split into two blocks acting separately 
on W and W±. Both W and W± may contain further invariant subspaces and 
be split still further. 

As a concrete example of this approach, consider in plane polar coordinates 
p, <f>, the effect of rotations about the polar axis on the infinite-dimensional vector 
space V of all functions of (j) that satisfy the Dirichlet conditions for expansion 
as a Fourier series (see section 12.1). We take as our basis functions the set 
{sin m<j t>, cos m<j>) for integer values m = 0, 1, 2, ... ; this is an infinite-dimensional 
representation (n = oo) and, since a rotation about the polar axis can be through 
any angle a (0 < a < 27t), the group Q is a subgroup of the continuous rotation 
group and has its order g formally equal to infinity. 
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Now, for some k, consider a vector w in the space Wk spanned by {sin kxj), coskc/)}, 
say w = asinkxj) + bcosk(f). Under a rotation by a about the polar axis, asinkcj) 
becomes asink((j) + a), which can be written as acoskasinkcj) + asinkacoskcj), i.e 
as a linear combination of sin kef) and cos k<j>\ similarly cos k<p becomes another 
linear combination of the same two functions. Thus w' = D(a)w also belongs 
to Wk for any a and we can conclude that Wk is an invariant irreducible two- 
dimensional subspace of V. It follows that D(a) is reducible and that, since the 
result holds for every k, in its reduced form D(a) has an infinite series of identical 
2 x 2 blocks on its leading diagonal; each block will have the form 

/ cos a — sin a 
\ sin a cos a 

We note that the particular case k = 0 is special, in that then sink<^ = 0 and 
cos k<j) = 1, for all <^>; consequently the first 2x2 block in D(a) is reducible further 
and becomes two single-entry blocks. 

A second illustration of the connection between the behaviour of vector spaces 
under the actions of the elements of a group and the form of the matrix repre- 
sentation of the group is provided by the vector space spanned by the spherical 
harmonics Y/ m (6, (f>). This contains subspaces, corresponding to the different 
values of /, that are invariant under the actions of the elements of the full three- 
dimensional rotation group; the corresponding matrices are block-diagonal, and 
those entries that correspond to the part of the basis containing Y/ m (d,(j)) form a 
(2/ + 1 ) x (2/ + 1 ) block. 

To illustrate further the irreps of a group, we return again to the group Q of 
two-dimensional rotation and reflection symmetries of an equilateral triangle, or 
equivalently the permutation group S 3 ; this may be shown, using the methods of 
section 25.7 below, to have three irreps. Firstly, we have already seen that the set 
M of six orthogonal 2x2 matrices given in section (24.3), equation (24.13), is 
isomorphic to Q. These matrices therefore form not only a representation of Q, 
but a faithful one. It should be noticed that, although Q contains six elements, 
the matrices are only 2x2. However, they contain no invariant lxl sub-block 
(which for 2 x 2 matrices would require them all to be diagonal) and neither can 
all the matrices be made block diagonal by the same similarity transformation; 
they therefore form a two-dimensional irrep of Q. 

Secondly, as previously noted, every group has one (unfaithful) irrep in which 
every element is represented by the lxl matrix fi, or, more simply, 1 . 

Thirdly an (unfaithful) irrep of Q is given by assignment of the one-dimensional 
set of six ‘matrices’ {1, 1, 1,-1,— 1,-1} to the symmetry operations {I,R,R',K, 
L,M} respectively, or to the group elements {I,A,B,C,D,E} respectively; see 
section 24.3. In terms of the permutation group S 3 , 1 corresponds to even 
permutations and —1 to odd permutations, ‘odd’ or ‘even’ referring to the number 
of simple pair interchanges to which a permutation is equivalent. That these 
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assignments are in accord with the group multiplication table 24.8 should be 
checked. 

Thus the three irreps of the group Q (i.e. the group 3 m or C 3„ or S3), are, using 
the conventional notation Ai, A2, E (see section 25 . 8 ), as follows: 


where 

M/ 


M c 


Element 



I 

A 

B 

C 

D 

E 

Ai 

1 

1 

1 

1 

1 

1 

a 2 

1 

1 

1 

-1 

-1 

-1 

E 

M/ 

m a 

M b 

M c 

M d 

M e 


( 25 . 12 ) 







A 

2 

_ 1 
2 


25.5 The orthogonality theorem for irreducible representations 

We come now to the central theorem of representation theory, a theorem that 
justifies the relatively routine application of certain procedures to determine 
the restrictions that are inherent in physical systems that have some degree of 
rotational or reflection symmetry. The development of the theorem is long and 
quite complex when presented in its entirety, and the reader will have to refer 
elsewhere for the proof.f 

The theorem states that, in a certain sense, the irreps of a group Q are as 
orthogonal as possible, as follows. If, for each irrep, the elements in any one 
position in each of the g matrices are used to make up g-component column 
matrices then 

(i) any two such column matrices coming from different irreps are orthogonal; 

(ii) any two such column matrices coming from different positions in the 
matrices of the same irrep are orthogonal. 

This orthogonality is in addition to the irreps’ being in the form of orthogo- 
nal (unitary) matrices and thus each comprising mutually orthogonal rows and 
columns. 


| See, e.g., Groups, Representation and Physics, H. F. Jones (Institute of Physics), Group Theory in 
Quantum Mechanics, J. F. Cornwell (Academic Press), or Linear Representations of Finite Groups, 
J. P. Sore (Springer- Verlag). 
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More mathematically, if we denote the entry in the ith row and yth column of a 
matrix D(X) by [D(X)]y, and D </J and D (,<) are two irreps of Q having dimensions 
n; and respectively, then 

V [d ( V)1* [d ( V)1 =-SikSj,5^. (25.13) 

Y L J n L J ki n A 

This rather forbidding-looking equation needs some further explanation. 

Firstly, the asterisk indicates that the complex conjugate should be taken if 
necessary, though all our representations so far have involved only real matrix 
elements. Each Kronecker delta function on the right-hand side has the value 1 
if its two subscripts are equal and has the value 0 otherwise. Thus the right-hand 
side is only non-zero if i = k, j = / and X = //, all at the same time. 

Secondly, the summation over the group elements X means that g contributions 
have to be added together, each contribution being a product of entries drawn 
from the representative matrices in the two irreps D = {D (X)} and D‘ = 
{D^'Vx)}. The g contributions arise as X runs over the g elements of Q. 

Thus, putting these remarks together, the summation will produce zero if either 

(i) the matrix elements are not taken from exactly the same position in every 
matrix, including cases in which it is not possible to do so because the 
irreps D (/> and D (/l1 have different dimensions, or 

(ii) even if D (/> and D (, ° do have the same dimensions and the matrix elements 
are from the same positions in every matrix, they are different irreps, i.e. 

x y /<. 

Some numerical illustrations based on the irreps Ai, A 2 and E of the group 3m 
(or C$ v or S 3 ) will probably provide the clearest explanation (see (25.12)). 

(a) Take i = j = k = / = 1, with D (/> = Ai and D^ 1 = A 2 . Equation (25.13) 
then reads 

1(1) + 1(1) + 1(1) + 1(-1) + 1( — 1) + 1(-1) = 0, 
as expected, since X Y R 

(b) Take ( i,j ) as (1,2) and (k,l) as (2,2), corresponding to different matrix 
positions within the same irrep D W = D (,<> = E. Substituting in (25.13) 
gives 

o(D + (-#)(-!) + (#)H) + o(D + (-f)K) + (# )H) = 0. 

(c) Take (i,j) as (1,2), and (k, l) as (1,2), corresponding to the same matrix 

positions within the same irrep D l/J = = E. Substituting in (25.13) 

gives 

0 ( 0 )+ (-#)(-#) + (#)(#) + 0 ( 0 )+ (-#)(-#) + (#)(#)-!■ 
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(d) No explicit calculation is needed to see that if i = j = k = l = 1, with 
D U) = d (, ‘ i = Ai (or A2), then each term in the sum is either l 2 or (— l) 2 
and the total is 6, as predicted by the right-hand side of (25.13) since g = 6 
and n; = 1. 


25.6 Characters 

The actual matrices of general representations and irreps are cumbersome to 
work with, and they are not unique since there is always the freedom to change 
the coordinate system, i.e. the components of the basis vector (see section 25.3), 
and hence the entries in the matrices. However, one thing that does not change 
for a matrix under such an equivalence (similarity) transformation - i.e. under 
a change of basis - is the trace of the matrix. This was shown in chapter 8, 
but is repeated here. The trace of a matrix A is the sum of its diagonal ele- 
ments, 

n 

Tr A = ^ A a 

i=t 

or, using the summation convention (section 21.1), simply A,,-. Under a similarity 
transformation, again using the summation convention, 

[Dq(X)]u = [Q 1 ]z7 [D(X)] jk [Q]fci 
= [D(X)] yfc[Q]fci[Q -1 ]y 
= [D(Y)] 7 , [!],,- 
= [D (X)\jj, 

showing that the traces of equivalent matrices are equal. 

This fact can be used to greatly simplify work with representations, though with 
some partial loss of the information content of the full matrices. For example, 
using trace values alone it is not possible to distinguish between the two groups 
known as 4mm and 42m, or as C4,, and ZUd respectively, even though the two 
groups are not isomorphic. To make use of these simplifications we now define 
the characters of a representation. 

Definition. The characters x(D) of a representation D of a group Q are defined as 
the set of traces of the matrices D(X), one for each element X of Q. 

At this stage there will be g characters, but, as we noted in subsection 24.7.3, 
elements A, B of Q in the same conjugacy class are connected by equations of 
the form B = X~ l AX. It follows that their matrix representations are connected 
by corresponding equations of the form D(B) = D(X -1 )D(H)D(A), and so by the 
argument just given their representations will have equal traces and hence equal 
characters. Thus elements in the same conjugacy class have the same characters, 
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3 m 

I 

A, B 

C, D, E 


Ai 

1 

1 

1 

z; z 2 ; x 2 + y 2 

a 2 

1 

1 

-1 

R- 

E 

2 

-1 

0 

(x,.v); (xz,yz); (R x ,R y ); (x 2 - y 2 , 2 xy) 


Table 25.1 The character table for the irreps of group 3 m (C3,, or Si). The 
right-hand column lists some common functions that transform according to 
the irrep against which each is shown (see text). 


though, in general, these will vary from one representation to another. However, 
it might also happen that two or more conjugacy classes have the same characters 
in a representation - indeed, in the trivial irrep Ai, see (25.12), every element 
inevitably has the character 1. 

For the irrep A 2 of the group 3m, the classes {/}, {A, B} and {C, £>,£} have 
characters 1, 1 and —l, respectively, whilst they have characters 2,-1 and 0 
respectively in irrep E. 

We are thus able to draw up a character table for the group 3m as shown 
in table 25.1. This table holds in compact form most of the important infor- 
mation on the behaviour of functions under the two-dimensional rotational and 
reflection symmetries of an equilateral triangle, i.e. under the elements of group 
3m. The entry under I for any irrep gives the dimension of the irrep, since it 
is equal to the trace of the unit matrix whose dimension is equal to that of 
the irrep. In other words, for the Ath irrep y^ 2 \l) = n;, where «; is its dimen- 
sion. 

In the extreme right-hand column we list some common functions of Cartesian 
coordinates that transform, under the group 3m, according to the irrep on whose 
line they are listed. Thus, as we have seen, z, z 2 , and x 2 + y 2 are all unchanged 
by the group operations (though .x and y individually are affected) and so are 
listed against the one-dimensional irrep Ai. Each of the pairs (x,y), (xz,yz), and 
(x 2 — y 2 ,2xy), however, is mixed as a pair by some of the operations, and so these 
pairs are listed against the two-dimensional irrep E: each pair forms a basis for 
this irrep. 

The quantities R x , R v and R : refer to rotations about the indicated axes; 
they transform in the same way as the corresponding components of angular 
momentum J, and their behaviour can be established by examining how the 
components of J = r x p transform under the operations of the group. To do 
this explicitly is beyond the scope of this book. However, it can be noted that 
R z , being listed opposite the one-dimensional A 2 , is unchanged by / and by the 
rotations A and B but changes sign under the mirror reflections C, D, and E, as 
would be expected. 
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25.6.1 Orthogonality property of characters 


Some of the most important properties of characters can be deduced from the 
orthogonality theorem (25.13), 


x 


b a \x) 


J y L 


D (,<) (X) 


J kt 


—Sikdjid^. 

n x 


If we set j = i and / = k, so that both factors in any particular term in the 
summation refer to diagonal elements of the representative matrices, and then 
sum both sides over i and k, we obtain 


nx "/• 

EEE^V) 


D (/<) (X) 


X i= 1 k = 1 

Expressed in term of characters, this reads 


kk 


fEE^v 

'M. ■ i . < 

1=1 k= 1 


X 


g 


n;. 




£- E 1 x ^ = g'V 

,-=i 


(25.14) 


In words, the (g-component) ‘vectors’ formed from the characters of the various 
irreps of a group are mutually orthogonal, but each one has a squared magnitude 
(the sum of the squares of its components) equal to the order of the group. 

Since, as noted in the previous subsection, group elements in the same class 
have the same characters, (25.14) can be written as a sum over classes rather than 
elements. If c t denotes the number of elements in class C,- and X t any element of 
Cj, then 

E c >' * PKXi) = g-V (25.15) 


Although we do not prove it here, there also exists a ‘completeness’ relation for 
characters. It makes a statement about the products of characters for a fixed pair 
of group elements, X\ and Xx, when the products are summed over all possible 
irreps of the group. This is the converse of the summation process defined by 
(25.14). The completeness relation states that 

E [^(* 1 )] * X W (X 2 ) = —S Cl c 2 , (25.16) 

A 

where element X\ belongs to conjugacy class C\ and X 2 belongs to C 2 . Thus the 
sum is zero unless Xi and X 2 belong to the same class. For table 25.1 we can 
verify that these results are valid. 

(i) For D W = D ,,,) = A! or A 2 , (25.15) reads 

1(1) + 2(1) + 3(1) = 6, 
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whilst for q ' ] = D (,,) = E, it gives 

1(2 2 ) + 2(1) + 3(0) = 6. 

(ii) For D (/) = A 2 and D (,<) = E, say, (25.15) reads 

1(1)(2) + 2(1)(— 1) + 3(— 1)(0) = 0. 

(iii) For X\ = A and X 2 = D , say, (25.16) reads 

1(1) + i( — 1) + ( — 1)(0) = 0, 

whilst for X\ = C and X 2 = E. both of which belong to class C 3 for which 
c 3 = 3, 

1(1) + (— 1)(— 1) + (0)(0) = 2 = |. 

25.7 Counting irreps using characters 

The expression of a general representation D = {D(X)} in terms of irreps, as 
given in (25.11), can be simplified by going from the full matrix form to that of 
characters. Thus 

D(X) = mib a) {X) © m 2 D {2 \x) © • • • © m N D iN \x) 

becomes, on taking the trace of both sides, 

N 

X (X) = Y, m/ j U, ( x ). (25.17) 

x = 1 

Given the characters of the irreps of the group Q to which the elements X belong, 
and the characters of the representation D = (D(X)}, the g equations (25.17) 
can be solved as simultaneous equations in the m 2 , either by inspection or by 
multiplying both sides by [x (,i) (20] and summing over X, making use of (25.14) 
and (25.15), to obtain 

"V = - E * ti X) = - E c ' IV^-)] * x( x 0- (25.18) 

g Y g ; 

That an unambiguous formula can be given for each m;, once the character 
set (the set of characters of each of the group elements or, equivalently, of 
each of the conjugacy classes) of D is known, shows that, for any particular 
group, two representations with the same characters are equivalent. This strongly 
suggests something that can be shown, namely, the number of irreps = the number 
of conjugacy classes. The argument is as follows. Equation (25.17) is a set of 
simultaneous equations for N unknowns, the m;, some of which may be zero. The 
value of N is equal to the number of irreps of Q. There are g different values of 
X, but the number of different equations is only equal to the number of distinct 
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conjugacy classes, since any two elements of Q in the same class have the same 
character set and therefore generate the same equation. For a unique solution 
to simultaneous equations in N unknowns, exactly N independent equations are 
needed. Thus N is also the number of classes, establishing the stated result. 


► Determine the irreps contained in the representation of the group 3 m in the vector space 
spanned by the functions x 2 , y 2 , xy. 

We first note that although these functions are not orthogonal they form a basis set for a 
representation, since they are linearly independent quadratic forms in x and y and any other 
quadratic form can be written (uniquely) in terms of them. We must establish how they 
transform under the symmetry operations of group 3 m. We need to do so only for a repre- 
sentative element of each conjugacy class, and naturally we take the simplest in each case. 
The first class contains only / (as always) and clearly D(/) is the 3x3 unit matrix. 

The second class contains the rotations, A and B, and we choose to find D (A). Since, 
under A, 

x -> -*x+^y and y -*• -^x-*y, 

it follows that 

x 2 -»• \x 2 - ^xy + \y 2 , y 2 -> fx 2 + ^xy + \y 2 (25.19) 

and 

xy -*• £x 2 - \xy - ^-y 2 . (25.20) 

Hence D (A) can be deduced and is given below. 

The third and final class contains the reflections, C, D and E ; of these C is much the 
easiest to deal with. Under C, x — > — x and y — *• y, causing xy to change sign but leaving 
x 2 and y 2 unaltered. The three matrices needed are thus 


( 1 

0 

0 \ 


( 

1 

4 

3 

4 

2 

\ 

D (/) = h, D(C) = 0 

1 

0 

, 0(A) = 


3 

4 

1 

4 

si 

2 


V 0 

0 -1 J 



a/I 

4 

_a/I 

4 

1 

2 

/ 


their traces are respectively 3, 1 and 0. 

It should be noticed that much more work has been done here than is necessary, since 
the traces can be computed immediately from the effects of the symmetry operations on the 
basis functions. All that is needed is the weight of each basis function in the transformed 
expression for that function; these are clearly 1, 1, 1 for /, and 1 , 1, — f for A, from (25.19) 
and (25.20), and 1, 1, —1 for C, from the observations made just above the displayed 
matrices. The traces are then the sums of these weights. The off-diagonal elements of the 
matrices need not be found, nor need the matrices be written out. 

From (25.17) we now need to find a superposition of the characters of the irreps that 
gives representation D in the bottom line of table 25.2. 

By inspection it is obvious that D = Ai ffi E, but we can use (25.18) formally: 

m Al = g[l(l)(3) + 2(l)(0) + 3(l)(l)] = 1, 
m A 2 = | [1(1)(3) + 2(1)(0) + 3( — 1)( 1 )] = 0, 
m E = \ [1(2)(3) + 2(— 1)(0) + 3(0)(1)] = 1. 

Thus Ai and E appear once each in the reduction of D, and A? not at all. Table 25.1 
gives the further information, not needed here, that it is the combination x 2 + y 2 that 
transforms as a one-dimensional irrep and the pair (x 2 — y 2 , 2 xy) that forms a basis of 
the two-dimensional irrep, E. ◄ 
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Irrep 

I 

Classes 

AB 

CDE 

Ai 

1 

1 

1 

a 2 

1 

1 

-1 

E 

2 

-1 

0 

D 

3 

0 

1 


Table 25.2 The characters of the irreps of the group 3 m and of the represen- 
tation D, which must be a superposition of some of them. 


25.7.1 Summation rules for irreps 

The first summation rule for irreps is a simple restatement of (25.14), with /< set 
equal to A; it then reads 

X 

In words, the sum of the squares (modulus squared if necessary) of the characters 
of an irrep taken over all elements of the group adds up to the order of the 
group. For group 3m (table 25.1), this takes the following explicit forms: 

for A,, 1(1 2 ) + 2(1 2 ) + 3(1 2 ) = 6; 

for A 2 , 1(1 2 ) + 2( l 2 ) + 3(— l) 2 = 6; 

forE, 1(2 2 ) + 2( — l) 2 + 3(0 2 ) = 6. 

We next prove a theorem that is concerned not with a summation within an irrep 
but with a summation over irreps. 

Theorem. If is the dimension of the / ith irrep of a group Q then 

E n l = S’ 

where g is the order of the group. 

Proof. Define a representation of the group in the following way. Rearrange 
the rows of the multiplication table of the group so that whilst the elements in 
a particular order head the columns, their inverses in the same order head the 
rows. In this arrangement of the g x g table, the leading diagonal is entirely 
occupied by the identity element. Then, for each element X of the group, take as 
representative matrix the multiplication-table array obtained by replacing X by 
1 and all other element symbols by 0. The matrices D reg (X) so obtained form the 
regular representation of they are each gxg, have a single non-zero entry T 
in each row and column and (as will be verified by a little experimentation) have 
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I 

A 

B 



I 

A 

B 

I 

I 

A 

B 

(b) 

I 

I 

A 

B 

A 

A 

B 

I 

B 

B 

I 

A 

B 

B 

I 

A 


A 

A 

B 

I 


Table 25.3 (a) The multiplication table of the cyclic group of order 3, and 
(b) its reordering used to generate the regular representation of the group. 


the same multiplication structure as the group Q itself, i.e. they form a faithful 
representation of Q. 

Although not part of the proof a simple example may help to make these 
ideas more transparent. Consider the cyclic group of order 3. Its multiplication 
table is shown in table 25.3(a) (a repeat of table 24.10(a) of the previous chapter), 
whilst table 25.3 (b) shows the same table reordered so that the columns are 
still labelled in the order I, A, B but the rows are now labelled in the order 
7 _1 = I, Ar 1 = B , B~ l = A. The three matrices of the regular representation are 
then 


/ 1 0 0 
D reg (/)= 0 10 

V 0 0 1 


/ 0 1 0 \ / 0 0 1 \ 

D reg f4)= 0 0 1 , D reg fB)= 10 0 

\ 1 0 0 / \0 1 0 / 


An alternative, more mathematical, definition of the regular representation of a 
group is 


[D'"(GU], 


1 if GkGj = Gi, 
0 otherwise. 


We now return to the proof. With the construction given, the regular representa- 
tion has characters as follows: 


% re8 (/)=g, X res (X) = 0 if X±I. 

We now apply (25.18) to D reg to obtain for the number m ; , of times that the irrep 
D W appears in D reg (see 25.11)) 

= 1 5Z * X reg W = 1 [x (#,) C0] * Z reg (/ ) = l n fl g = v 

g g g 

Thus an irrep D (/0 of dimension n /( appears n ;( times in D reg , and so by counting 
the total number of basis functions, or by considering x reg (^ f we can conclude 
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that 

E n l = s- ( 25 - 21 ) 

This completes the proof. 

As before, our standard demonstration group 3m provides an illustration. In 
this case we have seen already that there are two one-dimensional irreps and one 
two-dimensional irrep. This is in accord with (25.21) since 

l 2 + l 2 -|-2 2 = 6, which is the order g of the group. 

Another straightforward application of the relation (25.21), to the group with 
multiplication table 25.3(a), yields immediate results. Since g = 3, none of its 
irreps can have dimension 2 or more, as 2 2 = 4 is too large for (25.21) to be 
satisfied. Thus all irreps must be one-dimensional and there must be three of 
them (consistent with the fact that each element is in a class of its own, and that 
there are therefore three classes). The three irreps are the sets of 1 x 1 matrices 
(numbers) 

Ai = {1,1,1} A 2 = {l,to, cn 2 } Aj = {l, co 2 , co}, 

where co = exp(27n'/3); since the matrices are lxl, the same set of nine numbers 
would be, of course, the entries in the character table for the irreps of the group. 
The fact that the numbers in each irrep are all cube roots of unity is discussed 
below. As will be noticed, two of these irreps are complex - an unusual occurrence 
in most applications - and form a complex conjugate pair of one-dimensional 
irreps. In practice, they function much as a two-dimensional irrep, but this is to 
be ignored for formal purposes such as theorems. 

A further property of characters can be derived from the fact that all elements 
in a conjugacy class have the same order. Suppose that the element X has order 
m, i.e. X m = /. This implies for a representation D of dimension n that 

[D(A)] m = h. (25.22) 

Representations equivalent to D are generated as before by using similarity 
transformations of the form 

D q (X) = Cr‘D(X)Q. 

of Q to be the eigenvectors of D(A) then, 

0 0 \ 

A 2 : 

0 

• • • 0 / 


In particular, if we choose the columns 
as discussed in chapter 8, 

/ At 
0 


d q (x) = 


V 0 
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where the A; are the eigenvalues of D(Y). Therefore, from (25.22), we have that 


/ o • • • o \ 

0 A™ : 

: 0 

\0 • • • 0 A"‘ ) 


/ 1 0 • • • 0 \ 

0 1 : 

: 0 

V o ... 0 1 / 


Hence all the eigenvalues A; are mth roots of unity, and so y(X), the trace of 
D(2f ), is the sum of n of these. In view of the implications of Lagrange’s theorem 
(section 24.6 and subsection 24.7.1), the only values of m allowed are the divisors 
of the order g of the group. 


25.8 Construction of a character table 

In order to decompose representations into irreps on a routine basis using 
characters, it is necessary to have available a character table for the group in 
question. Such a table gives, for each irrep /< of the group, the character y w {X) 
of the class to which group element X belongs. To construct such a table the 
following properties of a group, established earlier in this chapter, may be used : 

(i) the number of classes equals the number of irreps; 

(ii) the ‘vector’ formed by the characters from a given irrep is orthogonal to 
the ‘vector’ formed by the characters from a different irrep; 

(iii) E,i = g, where /? 7 , is the dimension of the /dh irrep and g is the order 
of the group; 

(iv) the identity irrep (one-dimensional with all characters equal to 1) is present 
for every group; 

(v) Ex |x (/i) PO| 2 = g- 

(vi) y!' l, (X) is the sum of n lt mth roots of unity, where m is the order of X. 


► Construct the character table for the group 4 mm (or C 4 ,,) using the properties of classes, 
irreps and characters so far established. 


The group 4 mm is the group of two-dimensional symmetries of a square, namely rotations 
of 0, rc/2, n and 3 k/ 2 and reflections in the mirror planes parallel to the coordinate axes 
and along the main diagonals. These are illustrated in figure 25.3. For this group there are 
eight elements: 

• the identity, / ; 

• rotations by rc/2 and 3n/2, R and R r ; 

• a rotation by 71, Q ; 

• four mirror reflections m x , m y , nu and m/. 

Requirements (i) to (iv) at the start of this section put tight constraints on the possible 
character sets, as the following argument shows. 

The group is non-Abelian (clearly Rm x f m x R), and so there are fewer than eight 
classes, and hence fewer than eight irreps. But requirement (iii), with g = 8, then implies 
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m d 


m y 


Figure 25.3 The mirror planes associated with 4 mm, the group of two- 
dimensional symmetries of a square. 


that at least one irrep has dimension 2 or greater. However, there can be no irrep with 
dimension 3 or greater, since 3 2 > 8, nor can there be more than one two-dimensional 
irrep, since 2 2 + 2 2 = 8 would rule out a contribution to the sum in (iii) of l 2 from the 
identity irrep, and this must be present. Thus the only possibility is one two-dimensional 
irrep and, to make the sum in (iii) correct, four one-dimensional irreps. 

Therefore using (i) we can now deduce that there are five classes. This same conclusion 
can be reached by evaluating X^ 1 Y X for every pair of elements in Q, as in the description 
of conjugacy classes given in the previous chapter. However, it is tedious to do so and 
certainly much longer than the above. The five classes are /, Q, { R , R'}, {m x , m y }, {mj, m/}. 

It is straightforward to show that only / and Q commute with every element of the 
group, so they are the only elements in classes of their own. Each other class must have 
at least 2 members, but, as there are three classes to accommodate 8 — 2 = 6 elements, 
there must be exactly 2 in each class. This does not pair up the remaining 6 elements, but 
does say that the five classes have 1, 1, 2, 2, and 2 elements. Of course, if we had started 
by dividing the group into classes, we would know the number of elements in each class 
directly. 

We cannot entirely ignore the group structure (though it sometimes happens that the 
results are independent of the group structure - for example, all non-Abelian groups of 
order 8 have the same character table!); thus we need to note in the present case that 
mj = I for i = x, y, d or d! and, as can be proved directly, Rrrii = irijR' for the same four 
values of label i. We also recall that for any pair of elements X and Y , D(If) = D(2f)D(T). 
We may conclude the following for the one-dimensional irreps. 

(a) In view of result (vi), y(m,) = D(m,) = +1. 

(b) Since R 4 = /, result (vi) requires that x(R) is one of 1, i, —1, — i. But, since 
D(R)D(m,) = D(m,)D(R'), and the D(m,) are just numbers, D(R) = D(R'). Further 

D(R)D(R) = D(R)D(R') = D(RR') = D(/) = 1, 

and so D(R) = +1 = D(R'). 

(c) D(Q) = D(RR) = D(R)D(R)=1. 

If we add this to the fact that the characters of the identity irrep Ai are all unity then we 
can fill in those entries in character table 25.4 shown in bold. 

Suppose now that the three missing entries in a one-dimensional irrep are p, q and r, 
where each can only be +1. Then, allowing for the numbers in each class, orthogonality 
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4mm 

I 

Q 

R, R' 

m x , m y 

m ch mj' 

Ai 

1 

1 

1 

1 

1 

A, 

1 

1 

1 

-1 

-1 

Bi 

1 

1 

-1 

1 

-1 

b 2 

1 

1 

-1 

-1 

1 

E 

2 

-2 

0 

0 

0 


Table 25.4 The character table deduced for the group 4 mm. For an explana- 
tion of the entries in bold see the text. 


with the characters of Ai requires that 

1(D(1) + l(l)(l) + 2(l)(p) + 2(l)( 9 ) + 2(l)(r) = 0. 

The only possibility is that two of p , q, and r equal —1 and the other equals +1. This 
can be achieved in three different ways, corresponding to the need to find three further 
different one-dimensional irreps. Thus the first four lines of entries in character table 25.4 
can be completed. The final line can be completed by requiring it to be orthogonal to the 
other four. Property (v) has not been used here though it could have replaced part of the 
argument given. ◄ 


25.9 Group nomenclature 

The nomenclature of published character tables, as we have said before, is erratic 
and sometimes unfortunate; for example, often E is used to represent, not only 
a two-dimensional irrep, but also the identity operation, where we have used /. 
Thus the symbol E might appear in both the column and row headings of a 
table, though with quite different meanings in the two cases. In this book we use 
roman capitals to denote irreps. 

One-dimensional irreps are regularly denoted by A and B, B being used if a 
rotation about the principal axis of 2n/n has character —1. Here n is the highest 
integer such that a rotation of 2n /n is a symmetry operation of the system, and 
the principal axis is the one about which this occurs. For the group of operations 
on a square, n = 4, the axis is the perpendicular to the square and the rotation 
in question is R. The names for the group, 4 mm and C 4 ,,, derive from the fact 
that here n is equal to 4. Similarly, for the operations on an equilateral triangle, 
n = 3 and the group names are 3m and C_ 3 „, but because the rotation by 2n/3 has 
character +1 in all its one-dimensional irreps (see table 25.1), only A appears in 
the irrep list. 

Two-dimensional irreps are denoted by E, as we have already noted, and three- 
dimensional irreps by T, although in many cases the symbols are modified by 
primes and other alphabetic labels to denote variations in behaviour from one 
irrep to another in respect of mirror reflections and parity inversions. In the study 
of molecules, alternative names based on molecular angular momentum properties 
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are common. It is beyond the scope of this book to list all these variations, or to 
give a large selection of character tables; our aim is to demonstrate and justify 
the use of those found in the literature specifically dedicated to crystal physics or 
molecular chemistry. 

Variations in notation are not restricted to the naming of groups and their 
irreps, but extend to the symbols used to identify a typical element, and hence 
all members, of a conjugacy class in a group. In physics these are usually of the 
types n z , n : or m x . The first of these denotes a rotation of 2n/n about the z-axis, 
and the second the same thing followed by parity inversion (all vectors r go to 
— r), whilst the third indicates a mirror reflection in a plane, in this case the plane 
x = 0. 

Typical chemistry symbols for classes are NC n , NC„, NC NS„, a„, a xy . Here 
the first symbol N, where it appears, shows that there are N elements in the 
class (a useful feature). The subscript n has the same meaning as in the physics 
notation, but a rather than m is used for a mirror reflection, subscripts v, d or h or 
superscripts xy, xz or yz denoting the various orientations of the relevant mirror 
planes. Symmetries involving parity inversions are denoted by S; thus S„ is the 
chemistry analogue of h. None of what is said in this and the previous paragraph 
should be taken as definitive, but merely as a warning of common variations in 
nomenclature and as an initial guide to corresponding entities. Before using any 
set of group character tables, the reader should ensure that he or she understands 
the precise notation being employed. 


25.10 Product representations 

In quantum mechanical investigations we are often faced with the calculation of 
what are called matrix elements. These normally take the form of integrals over all 
space of the product of two or more functions whose analytic forms depend on the 
microscopic properties (usually angular momentum and its components) of the 
electrons or nuclei involved. For ‘bonding’ calculations involving ‘overlap integrals’ 
there are usually two functions involved, whilst for transition probabilities a third 
function, giving the spatial variation of the interaction Hamiltonian, also appears 
under the integral sign. 

If the environment of the microscopic system under investigation has some 
symmetry properties, then sometimes these can be used to establish, without 
detailed evaluation, that the multiple integral must have zero value. We now 
express the essential content of these ideas in group theoretical language. 
Suppose we are given an integral of the form 

J = J ^(j)dr or J = j 'V^dx 

to be evaluated over all space in a situation in which the physical system is 
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invariant under a particular group Q of symmetry operations. For the integral to 
be non-zero the integrand must be invariant under each of these operations. In 
group theoretical language, the integrand must transform as the identity, the one- 
dimensional representation Ai of Q\ more accurately, some non-vanishing part of 
the integrand must do so. 

An alternative way of saying this is that if under the symmetry operations 
of Q the integrand transforms according to a representation D and D does not 
contain amongst its irreps then the integral J is necessarily zero. It should be 
noted that the converse is not true ; J may be zero even if Ai is present, since the 
integral, whilst showing the required invariance, may still have the value zero. 

It is evident that we need to establish how to find the irreps that go to make 
up a representation of a double or triple product when we already know the 
irreps according to which the factors in the product transform. The method is 
established by the following theorem. 

Theorem. For each element of a group the character in a product representation is 
the product of the corresponding characters in the separate representations. 

Proof Suppose that {«,} and {vj} are two sets of basis functions, that transform 
under the operations of a group Q according to representations D (/) and D (p) 
respectively. Denote by u and v the corresponding basis vectors and let X be an 
element of the group. Then the functions generated from and vj by the action 
of X are calculated as follows, using (25.1) and (25.4): 

u, = Xu, = [(D a >(X)) T u] _ = [D W (X)\.. u, + £ [(D^(X)) T ] u,, 

l+i 

Vj=Xvj = [(D'"»(A)) T v]^ = [D MiXUjjVj + Y, [(D^pO) 1 ] v„. 

Here [D(X)] ;j - is just a single element of the matrix D(X) and [D(X)k = [D T (X)k- 
is simply a diagonal element from the matrix - the repeated subscript does not 
indicate summation. Now, if we take as basis functions for a product represen- 
tation D prod (2f) the products w k = u,Vj (where the various possible pairs of 
values i, j are labelled by k), we have also that 

w' k = X Wk = Xu-jVj = (XufiXvj) 

= [D ,;) (2f )] .. [D (,1, (A)] . . UiVj + terms not involving the product UjVj. 

This is to be compared with 

w' k =Xw k = [(D prod (A)) T wj = [D prod (X)] tt w, + ^ f(D prod (X)) T l w n , 

J k z — ■* L J kn 

n=f=k 

where D prod (X) is the product representation matrix for element X of the group. 
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The comparison shows that 

[1 0 pmd (X)] kk = [DW(X)] U [d ( "'(Y)] ... 

It follows that 

nxn,i 

f m \X) = £ [DP rod (X)] tt 
k = 1 
nx n n 

= EE[d W) W],,.[d W(Y)].. 

i= 1 1=1 

= |E[d W) W]«}|E[d ( ' ,) W] w } 

= x W (X) x ifl) (X). (25.23) 


This proves the theorem, and a similar argument leads to the corresponding result 
for integrands in the form of a product of three or more factors. 

An immediate corollary is that an integral whose integrand is the product of 
two functions transforming according to two different irreps is necessarily zero. To 
see this, we use (25.18) to determine whether irrep Ai appears in the product 
character set x prod (2f): 

m Al = -£ [x (Al) POf X prod (*) = - ^/ rod (X) = -^/>(I)x M (I). 

g X % X % X 

We have used the fact that / ,Al) (Y) = 1 for all X but now note that, by virtue of 
(25.14), the expression on the right of this equation is equal to zero unless X = /<. 

Any complications due to non-real characters have been ignored - in practice, 
they are handled automatically as it is usually rather than that appears 
in integrands, though many functions are real in any case, and nearly all characters 
are. 

Equation (25.23) is a general result for integrands but, specifically in the context 
of chemical bonding, it implies that for the possibility of bonding to exist, the 
two quantum wavefunctions must transform according to the same irrep. This is 
discussed further in the next section. 


25.11 Physical applications of group theory 

As we indicated at the start of chapter 24 and discussed in a little more detail at 
the beginning of the present chapter, some physical systems possess symmetries 
that allow the results of the present chapter to be used in their analysis. We 
consider now some of the more common sorts of problem in which these results 
find ready application. 
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4© O ® 2 


Figure 25.4 A molecule consisting of four atoms of iodine and one of 
manganese. 


25.11.1 Bonding in molecules 

We have just seen that whether chemical bonding can take place in a molecule 
is strongly dependent upon whether the wavefunctions of the two atoms forming 
a bond transform according to the same irrep. Thus it is sometimes useful to be 
able to find a wavefunction that does transform according to a particular irrep 
of a group of transformations. This can be done if the characters of the irrep are 
known and a sensible starting point can be guessed. We state without proof that 
starting from any n-dimensional basis vector 'F = (*Fi T 2 • • • T*,,) 1 where {'F,} is 
a set of wavefunctions, the new vector l F ,/l) = ('F*/' 1 'Fj' 1 • • • x Fj; l, ) T generated by 

¥>■> = X W ’ (25.24) 

x 

will transform according to the 2th irrep. If the randomly chosen T happens not 
to contain any component that transforms in the desired way then the 'F , 4 ) so 
generated is found to be a zero vector and it is necessary to select a new starting 
vector. An illustration of the use of this ‘projection operator’ is given in the next 
example. 


► Consider a molecule made up of four iodine atoms lying at the corners of a square in the 
xy-plane, with a manganese atom at its centre, as shown in figure 25.4. Investigate whether 
the molecular orbital given by the superposition of p-state ( angular momentum 1 = 1) 
atomic orbitals 

'P 1 = *F„(r - Ri) + T x (r - R 2 ) - T,(r - R 3 ) - T x (r - R 4 ) 

can bond to the d-state atomic orbitals of the manganese atom described by either (a) 
</>i = (3z 2 — r 2 )f(r) or (b) <j >2 = (x 2 — y 2 )f(r). where f(r) is a function of r and so is 
unchanged by any of the symmetry operations of the molecule. Such linear combinations of 
atomic orbitals are known as ring orbitals. 


We have eight basis functions, the atomic orbitals *P A (1V) and 'V y (N), where N = 1, 2, 3, 4 
and indicates the position of an iodine atom. Since the wavefunctions are those of p-states 
they have the forms xf(r) or yf(r) and lie in the directions of the x- and y- axes shown in 
the figure. Since r is not changed by any of the symmetry operations, / (r) can be treated as 
a constant. The symmetry group of the system is 4 mm, whose character table is table 25.4. 

Case (a). The manganese atomic orbital ^ = (3z 2 — r 2 )f(r), lying at the centre of the 
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molecule, is not affected by any of the symmetry operations since z and r are unchanged 
by them. It clearly transforms according to the identity irrep A t . We therefore need to 
know which combination of the iodine orbitals '¥ X (N) and x V y (N), if any, also transforms 
according to A t . 

We use the projection operator (25.24). If we choose 'P v (l) as the arbitrary one- 
dimensional starting vector, we unfortunately obtain zero (as the reader may wish to 
verify), but *P V (1) does generate a new non-zero one-dimensional vector transforming 
according to Ai. The results of acting on H / J ,(l) with the various symmetry elements X 
can be written down by inspection (see the discussion in section 25.2). So, for example, the 
*P V (1) orbital centred on iodine atom 1 and aligned along the positive y-axis, is changed 
by the anticlockwise rotation of n/2 produced by R ’ into an orbital centred on atom 4 
and aligned along the negative x-axis; thus R'*F V (1) = — *P X (4). The complete set of group 
actions on *P V (1) is: 

/, T'v(l); Q, — V P V (3); R, ¥*(2); R\ -’F I (4); 

m x , TM1); my, -T,(3); m d , T x (2); mg, -T x (4). 

Now x (Al) (Y) = 1 for all X, so (25.24) states that the sum of the above results for AL V F V ( 1 ), 
all with weight 1, gives a vector (in this case of a one-dimensional irrep, just a wave- 
function) that transforms according to A! and is therefore capable of forming a chemical 
bond with the manganese wavefunction cj> i. It is 

T ,Al) = 2[ V F V (1) - T r (3) + *P a (2) - *P X (4)], 

though, of course, the factor 2 is irrelevant. This is precisely the ring orbital Th given in 
the problem, but here it is generated rather than guessed beforehand. 

Case (b). The atomic orbital <f> 2 = (x 2 — y 2 )f(r) behaves as follows under the action of 
typical conjugacy class members: 

I, 4>2\ Q, (/> 2 l R, (y 2 -x 2 )/(r) = — <j> 2 ; m x , (j> 2 ; m d , -fa. 

From this we see that 4> 2 transforms as a one-dimensional irrep, but, from table 25.4, that 
irrep is Bi not Ai (the irrep according to which *Pi transforms, as already shown). Thus 
<j > 2 and Tfi cannot form a bond. ◄ 

The original question did not ask for the the ring orbital to which cj ) 2 may 
bond, but it can be generated easily by using the values of X*T V (1) calculated in 
case (a) but now weighting them according to the characters of Bi : 

= Y y (l) - TM3) + (-1)^(2) - (—1)^(4) 

+ Tyi) - ^(3) + (-1)^(2) - (-1)^(4) 

= 2[ X F V (1) — T / x (2) — *^,,(3) + 'F x (4)]. 

Now we will find the other irreps of 4 mm present in the space spanned by 
the basis functions TMiV) and y ¥ y (N); at the same time this will illustrate the 
important point that since we are working with characters we are only interested 
in the diagonal elements of the representative matrices. This means (section 25.2) 
that if we work in the natural representation D nat we need consider only those 
functions that transform, wholly or partially, into themselves. Since we have no 
need to write out the matrices explicitly, their size (8 x 8) is no drawback. All the 
irreps spanned by the basis functions T' A -(IV) and y (N) can be determined by 
considering the actions of the group elements upon them, as follows. 
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(i) Under I all eight basis functions are unchanged, and %(I ) = 8. 

(ii) The rotations R , R' and Q change the value of N in every case and so 
all diagonal elements of the natural representation are zero and x(R) = 

x(Q) = o. 

(iii) m x takes x into — x and y into y and, for N = 1 and 3, leaves N unchanged, 
with the consequences (remember the forms of ^(iV) and y ¥ y (N)) that 

^(1) - — 'Fx(l), ^,(3) - -^,(3), 

^(l) - ^,(1), ¥,(3) - ¥,(3). 

Thus x( m x) has four non-zero contributions, — 1, —1, 1 and 1, together 
with four zero contributions. The total is thus zero. 

(iv) m c i and m# leave no atom unchanged and so x( m d) = 0. 

The character set of the natural representation is thus 8, 0, 0, 0, 0, which, either 
by inspection or by applying formula (25.18), shows that 

D nat = Ai © A 2 © Bi © B 2 ® 2E, 

i.e. that all possible irreps are present. We have constructed previously the 
combinations of 'VfN) and T'y(iV) that transform according to Ai and Bi. 
The others can be found in the same way. 


25.11.2 Matrix elements in quantum mechanics 

In section 25.10 we outlined the procedure for determining whether a matrix 
element that involves the product of three factors as an integrand is necessarily 
zero. We now illustrate this with a specific worked example. 


► Determine whether a ‘dipole’ matrix element of the form 

J = J Tf/j x*Pj, dz, 

where T,/, and 'I 1 ,/, are d-state wavefunctions of the forms xyf(r) and (x 2 — y 2 )g(r) respec- 
tively, can be non-zero (i) in a molecule with symmetry C} v (or 3 m), such as ammonia, and 
( ii ) in a molecule with symmetry C4,, ( or 4mm ), such as the Mnf molecule considered in 
the previous example. 


We will need to make reference to the character tables of the two groups. The table for 
C} v is table 25.1 (section 25.6); that for € 4 ,, is reproduced as table 25.5 from table 25.4 but 
with the addition of another column showing how some common functions transform. 

We make use of (25.23), extended to the product of three functions. No attention need 
be paid to f(r) and g(r) as they are unaffected by the group operations. 

Case (a). From the character table 25.1 for Cj v , we see that each of xy, x and x 2 — y 2 
forms part of a basis set transforming according to the two-dimensional irrep E. Thus we 
may fill in the array of characters (using chemical notation for the classes, except that 
we continue to use / rather than E) as shown in table 25.6. The last line is obtained by 
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4 mm 

I 

Q 

R, R' 

m x , m y 

mj , m d f 


Ai 

1 

l 

1 

1 

1 

z; z 2 ; x 2 +y 2 

a 2 

1 

l 

1 

-1 

-1 

Rz 

Bi 

1 

l 

-1 

1 

-1 

x 2 - y 2 

b 2 

1 

l 

-1 

-1 

1 

xy 

E 

2 

—2 

0 

0 

0 

(x,y); (xz,yz); (R x ,R y ) 


Table 25.5 The character table for the irreps of group 4 mm (or C 4 „). The 
right-hand column lists some common functions, or, for the two-dimensional 
irrep E, pairs of functions, that transform according to the irrep against which 
they are shown. 


Function 

Irrep 

I 

Classes 

2C 3 

3t7„ 

xy 

E 

2 

-1 

0 

X 

E 

2 

-1 

0 

2 2 
x — y z 

E 

2 

-1 

0 

product 


8 

-1 

0 


Table 25.6 The character sets, for the group C} v (or 3 mm), of three functions 
and of their product x 2 y(x 2 — y 2 ). 


Function 

Irrep 



Classes 


I 

c 2 

2C 6 

2<t„ 

2(7,, 

xy 

b 2 

1 

1 

-1 

-1 

1 

X 

E 

2 

-2 

0 

0 

0 

x 2 — y 2 

Bi 

1 

1 

-1 

1 

-1 

product 


2 

-2 

0 

0 

0 


Table 25.7 The character sets, for the group C 4 „ (or 4 mm), of three functions, 
and of their product x 2 y(x 2 — y 2 ). 


multiplying together the corresponding characters for each of the three elements. Now, by 
inspection, or by applying (25.18), i.e. 

m Al = i[l(l)(8) + 2(l)(— 1) + 3(1)(0)] = 1, 

we see that irrep A! does appear in the reduced representation of the product, and so J 
is not necessarily zero. 

Case (b). From table 25.5 we find that, under the group G*,,, xy and x 2 — y 2 transform 
as irreps B 2 and Bi respectively and that x is part of a basis set transforming as E. Thus 
the calculation table takes the form of table 25.7 (again, chemical notation for the classes 
has been used). 

Here inspection is sufficient, as the product is exactly that of irrep E and irrep Aj is 
certainly not present. Thus J is necessarily zero and the dipole matrix element vanishes. ◄ 
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Figure 25.5 An equilateral array of masses and springs. 


25.11.3 Degeneracy of normal modes 

As our final area for illustrating the usefulness of group theoretical results we 
consider the normal modes of a vibrating system (see chapter 9). This analysis 
has far-reaching applications in physics, chemistry and engineering. For a given 
system, normal modes that are related by some symmetry operation have the same 
frequency of vibration; the modes are said to be degenerate. It can be shown that 
such modes span a vector space that transforms according to some irrep of the 
group Q of symmetry operations of the system. Moreover, the degeneracy of 
the modes equals the dimension of the irrep. As an illustration, we consider the 
following example. 


► Investigate the possible vibrational modes of the equilateral triangular arrangement of 
equal masses and springs shown in figure 25.5. Demonstrate that two are degenerate. 


Clearly the symmetry group is that of the symmetry operations on an equilateral triangle, 
namely 3m (or C 3c ), whose character table is table 25.1. As on a previous occasion, it is 
most convenient to use the natural representation D nat of this group (it almost always 
saves having to write out matrices explicitly) acting on the six-dimensional vector space 
(xi, y i, X 2 , yi, X], y 3 ). In this example the natural and regular representations coincide, but 
this is not usually the case. 

We note that in table 25.1 the second class contains the rotations A (by n/2>) and B (by 
2n/2), also known as R and R'. This class is known as 3 r in crystallographic notation, or 
C3 in chemical notation, as explained in section 25.9. The third class contains C, D, E, the 
three mirror reflections. 

Clearly /(/) = 6. Since all position labels are changed by a rotation, x(3 z ) = 0. For the 
mirror reflections the simplest representative class member to choose is the reflection m y in 
the plane containing the y 3 -axis, since then only label 3 is unchanged; under m y , x 3 — > — x 3 
and y 3 — > y 3 , leading to the conclusion that /(m y ) = 0. Thus the character set is 6, 0, 0. 

Using (25.18) and the character table 25.1 shows that 

D nat = Aj © A 2 © 2E. 
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However, we have so far allowed x,-, y, to be completely general, and we must now identify 
and remove those irreps that do not correspond to vibrations. These will be the irreps 
corresponding to bodily translations of the triangle and to its rotation without relative 
motion of the three masses. 

Bodily translations are linear motions of the centre of mass, which has coordinates 

X = (. X I +x 2 + x 3 )/3 and y = + y 2 + y 3 )/ 3). 

Table 25.1 shows that such a coordinate pair (x, y) transforms according to the two- 
dimensional irrep E; this accounts for one of the two such irreps found in the natural 
representation. 

It can be shown that, as stated in table 25.1, planar bodily rotations of the triangle 
- rotations about the z-axis, denoted by R- - transform as irrep A 2 . Thus, when the 
linear motions of the centre of mass, and pure rotation about it, are removed from our 
reduced representation, we are left with E ffi Ai. These must be the irreps corresponding 
to the internal vibrations of the triangle. - one doubly degenerate mode and one non- 
degenerate mode. The physical interpretation of this is that two of the normal modes of the 
system have the same frequency and one normal mode has a different frequency (barring 
accidental coincidences for other reasons). It may be noted that in quantum mechanics 
the energy quantum of a normal mode is proportional to its frequency. ◄ 

In general, group theory does not tell us what the frequencies are, since it is 
entirely concerned with the symmetry of the system and not with the values of 
masses and spring constants. However, using this type of reasoning, the results 
from representation theory can be used to predict the degeneracies of atomic 
energy levels and, given a perturbation whose Hamiltonian (energy operator) has 
some degree of symmetry, the extent to which the perturbation will resolve the 
degeneracy. Some of these ideas are explored a little further in the next section 
and in the exercises. 


25.11.4 Breaking of degeneracies 

If a physical system has a high degree of symmetry, invariant under a group Q of 
reflections and rotations, say, then, as implied above, it will normally be the case 
that some of its eigenvalues (of energy, frequency, angular momentum etc.) are 
degenerate. However, if a perturbation that is invariant only under the operations 
of the elements of a smaller symmetry group (a subgroup of C?)is added, some of 
the original degeneracies may be broken. The results derived from representation 
theory can be used to decide the extent of the degeneracy-breaking. 

The normal procedure is to use an iV-dimensional basis vector, consisting of 
the N degenerate eigenfunctions, to generate an iV-dimensional representation of 
the symmetry group of the perturbation. This representation is then decomposed 
into irreps. In general, eigenfunctions that transform according to different irreps 
no longer share the same frequency of vibration. 

We illustrate this with the following example. 
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Figure 25.6 A circular drumskin loaded with three symmetrically placed 
masses. 


►A circular drumskin has three equal masses placed on it at the vertices of an equilateral 
triangle, as shown in figure 25.6. Determine which degenerate normal modes of the drumskin 
can be split in frequency by this perturbation. 


When no masses are present the normal modes of the drum-skin are either non-degenerate 
or two-fold degenerate (see chapter 19). The degenerate eigenfunctions *F of the nth normal 
mode have the forms 

J n (kr)(cosnd)e ±mt or J n (kr)(sinnd)e ±uo ‘. 

Therefore, as explained above, we need to consider the two-dimensional vector space 
spanned by 'Pi = sinnt) and *P 2 = cos nd. This will generate a two-dimensional representa- 
tion of the group 3m (or C 3 „), the symmetry group of the perturbation. Taking the easiest 
element from each of the three classes (identity, rotations, and reflections) of group 3m, 
we have 


/'P 1 ='P 1 , /'P 2 = 'P 2 , 

A'i'i = sin [n (d — |ti)] = (cos pin) *?! — (sin pm) *P 2 , 
ATS = cos [; n (6 — ^n)] = (cos |n7t) TS + (sin |n7t) *Pi, 
C 1 ?! = sin[ji(7i — 0)] = — (cosn7i)*Pi, 

C*P 2 = COS[n(7t — (9 )] = (COSf?7l)*P2. 

The three representative matrices are therefore 

/ — C0SW7I 
0 


cos pm — sin=n7i 


D(C) = 


0 

cos nn 


D(/) = l 2 , DM) = 

sin 2 n ' j{ ' ^ > 

The characters of this representation are y(l) = 2, y(A) = 2cos(2n7t/3) and y(C) = 0. 
Using (25.18) and table 25.1, we find that 

mAj = | (2 + 4 cos In7t) = m A2 

m e = i (4 — 4 cos |n7t) . 

Thus 


D = 


Ai ffi A 2 if n = 3, 6, 9, . . . , 
E otherwise. 


Hence the normal modes n = 3, 6, 9, ... each transform under the operations of 3m 
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as the sum of two one-dimensional irreps and, using the reasoning given in the previous 
example, are therefore split in frequency by the perturbation. For other values of n the 
representation is irreducible and so the degeneracy cannot be split. ◄ 


25.1 


25.12 Exercises 


A group Q has four elements I,X,Y and Z, which satisfy X 2 = Y 2 = Z 2 = 
XYZ = I. Show that Q is Abelian and hence deduce the form of its character 
table. 

Show that the matrices 


D «')-( o 1 )• «*>=( o‘ -1 )• 

D < y )-(“o‘ 7). DIZ >-(J -,)■ 


25.2 


25.3 


25.4 


where p is a real number, form a representation D of Q. Find its characters and 
decompose it into irreps. 

Using a square whose comers lie at coordinates (+1,+1), form a natural rep- 
resentation of the dihedral group 2? 4 . Find the characters of the representation, 
and, using the information (and class order) in table 25.4 (p. 944), express the 
representation in terms of irreps. 

Now form a representation in terms of eight 2x2 orthogonal matrices, by 
considering the effect of each of the elements of 2? 4 on a general vector (x,y). 
Confirm that this representation is one of the irreps found using the natural 
representation. 

The quaternion group Q (see exercise 24.20) has eight elements {+( , +(, + /, +k\ 
obeying the relations 

i 2 = j 2 = k 2 = — 1, ij = k = — ji. 

Determine the conjugacy classes of Q and deduce the dimensions of its irreps. 
Show that Q is homomorphic to the four-element group V, which is generated by 
two distinct elements a and b with a 2 = b 2 = (ab) 2 = I. Find the one-dimensional 
irreps of V and use these to help determine the full character table for Q. 

(a) By considering the possible forms of its cycle notation, determine the number 
of elements in each conjugacy class of the permutation group S 4 and show 
that S 4 has five irreps. Give the logical reasoning that shows they must consist 
of two three-dimensional, one two-dimensional, and two one-dimensional 
irreps. 

(b) By considering the odd and even permutations in the group S 4 establish the 
characters for one of the one-dimensional irreps. 

(c) Form a natural matrix representation of 4 x 4 matrices based on a set of 
objects {a,b,c,d}, which may or may not be equal to each other, and, by 
selecting one example from each conjugacy class, show that this natural rep- 
resentation has characters 4, 2, 1, 0, 0. The one-dimensional vector subspace 
spanned by sets of the form { a , a, a, a} is invariant under the permutation 
group and hence transforms according to the invariant irrep Ai. The remain- 
ing three-dimensional subspace is irreducible; use this and the characters 
deduced above to establish the characters for one of the three-dimensional 
irreps, T^ 

(d) Complete the character table using orthogonality properties, and check the 
summation rule for each irrep. You should obtain table 25.8. 
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Irrep 

(1) 

1 

Typical element and class 
(12) (123) (1234) 

6 8 6 

size 

(12)(34) 

3 

Ai 

1 

1 

1 

1 

1 

a 2 

1 

-1 

1 

-1 

1 

E 

2 

0 

-1 

0 

2 

Ti 

3 

1 

0 

-1 

-1 

t 2 

3 

-1 

0 

1 

-1 


Table 25.8 The character table for the permutation group S+ 


25.5 In exercise 24.10, the group of pure rotations taking a cube into itself was found 
to have 24 elements. The group is isomorphic to the permutation group S 4 , 
considered in the previous question, and hence has the same character table, once 
corresponding classes have been established. By counting the number of elements 
in each class make the correspondences below (the final two cannot be decided 
purely by counting, and should be taken as given). 


Permutation 

Symbol 

Action 

class type 

(physics) 


(1) 

/ 

none 

(123) 

3 

rotations about a body diagonal 

(12)(34) 

2 Z 

rotation of n about the normal to a face 

(1234) 

4 Z 

rotations of +n/2 about the normal to a 

(12) 

2i 

rotation of n about an axis through the 
centres of opposite edges 


Reformulate the character table 25.8 in terms of the elements of the rotation 
symmetry group (432 or 0) of a cube and use it when answering exercises 25.7 
and 25.8. 

25.6 Consider a regular hexagon orientated so that two of its vertices lie on the x-axis. 
Find matrix representations of a rotation R through n/6 and a reflection m y in 
the y-axis by determining their effects on vectors lying in the xy-plane . Show 
that a reflection m x in the x-axis can be written as m x = m y R 3 and that the (12) 
elements of the symmetry group of the hexagon are given by R n or R"m y . 

Using the representations of R and m y as generators, find a two-dimensional 
representation of the symmetry group, C 6 , of the regular hexagon. Is it a faithful 
representation? 

25.7 In a certain crystalline compound, a thorium atom lies at the centre of a regular 
octahedron of six sulphur atoms at positions (+o,0,0), (0, +a, 0), (0, 0, +a). These 
can be considered as being positioned at the centres of the faces of a cube of 
side 2a. The sulphur atoms produce at the site of the thorium atom an electric 
field that has the same symmetry group as a cube (432 or O ). 

The five degenerate d-electron orbitals of the thorium atom can be expressed, 
relative to any arbitrary polar axis, as 

(3 cos 2 9 - 1 )/(r), sin 9 cos 6f(r), e ±2 ^ sin 2 6f(r). 

A rotation about that polar axis by an angle cj)' effectively changes <j> to <f> — 4>'. 
Use this to show that the character of the rotation in a representation based on 
the orbital wavefunctions is given by 

1 + 2 cos 4>' + 2 cos 2cj)' 
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and hence that the characters of the representation, in the order of the symbols 
given in exercise 25.5, is 5, — 1, 1, —1, 1. Deduce that the five-fold degenerate 
level is split into two levels, a doublet and a triplet. 

25.8 Sulphur hexafluoride is a molecule with the same structure as the crystalline 
compound in exercise 25.7, except that a sulphur atom is now the central atom. 
The following are the forms of some of the electronic orbitals of the sulphur 
atom, together with the irreps according to which they transform under the 
symmetry group 432 (or O). 

Vs = f(r) Aj 

'Ppi = z/(r) Tr 

^=(3 z 2 -r 2 )f(r) E 

^ = (x 2 -y 2 )f(r) E 

T rf3 = xyf(r), T, 

The function x transforms according to the irrep Ti. Use the above data to 
determine whether dipole matrix elements of the form J = f <pixcj >2 di can be 
non-zero for the following pairs of orbitals (j>u<j >2 in a sulphur hexafluoride 
molecule: (a) 'f', 1 ,'P s ; (b) 'P -1 ,'P pl ; (c) 'V i2 , 'V :U ; (d) (e) 'P pl ,T s . 

25.9 The hydrogen atoms in a methane molecule CH 4 form a perfect tetrahedron 
with the carbon atom at its centre. The molecule is most conveniently described 
mathematically by placing the hydrogen atoms at the points (1, 1, 1), (1,— 1,— 1), 
(— 1,1,— 1) and (— 1,— 1,1). The symmetry group to which it belongs, the tetrahe- 
dral group (43m or Tj) has classes typified by /, 3, 2 Z , mj and 4 Z , where the first 
three are as in exercise 25.5, mj is a reflection in the mirror plane x — y = 0 and 
4- is a rotation of n/2 about the z-axis followed by an inversion in the origin. A 
reflection in a mirror plane can be considered as a rotation of n about an axis 
perpendicular to the plane, followed by an inversion in the origin. 

The character table for the group 43m is very similar to that for the group 
432, and has the form shown in table 25.9. 


Irreps 

I 

1 

Typical element and class size 

3 2 Z 4 Z m d 

8 3 6 6 

Functions transforming 
according to irrep 

Ai 

1 

1 

1 

1 

1 

x 2 + y 2 + z 2 

a 2 

1 

1 

1 

-1 

-1 


E 

2 

-1 

2 

0 

0 

( x 2 - y 2 , 3z 2 - r 2 ) 

Ti 

3 

0 

-1 

1 

-1 

(Rx, Ry, Rz) 

t 2 

3 

0 

-1 

-1 

1 

(x, y, z); (xy,yz,zx) 


Table 25.9 The character table for group 43m. 


By following the steps given below, determine how many different internal vibra- 
tion frequencies the CH 4 molecule has. 

(a) Consider a representation based on the 12 coordinates x,-, y,-, z,- for i = 
1,2, 3,4. For those hydrogen atoms that transform into themselves, a rota- 
tion through an angle 9 about an axis parallel to one of the coordinate axes 
gives rise in the natural representation to the diagonal elements 1 for the 
corresponding coordinate and 2 cos 9 for the two orthogonal coordinates. If 
the rotation is followed by an inversion then these entries are multiplied by 
— 1. Atoms not transforming into themselves give a zero diagonal contribu- 
tion. Show that the characters of the natural representation are 12, 0, 0, 0, 2 
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and hence that its expression in terms of irreps is 

A t © E © Ti ® 2Tt. 


25.10 


25.11 

25.12 


(b) The irreps of the bodily translational and rotational motions are included in 
this expression and need to be identified and removed. Show that when this 
is done it can be concluded that there are three different internal vibration 
frequencies in the CH 4 molecule. State their degeneracies and check that 
they are consistent with the expected number of normal coordinates needed 
to describe the internal motions of the molecule. 

(a) The set of even permutations of four objects (a proper subgroup of S 4 ) 
is known as the alternating group A 4 . List its twelve members using cycle 
notation. 

(b) Assume that all permutations with the same cycle structure belong to the 
same conjugacy class. Show that this leads to a contradiction and hence 
demonstrates that even if two permutations have the same cycle structure 
they do not necessarily belong to the same class. 

(c) By evaluating the products = (123)(4) • (12)(34) • (132)(4) and p 4 = 
(132)(4)«(12)(34)«(123)(4) deduce that the three elements of A 4 with structure 
of the form (12)(34) belong to the same class. 

(d) By evaluating products of the form ( la)(/?y ) • (123)(4) • (lot)(ySy), where a, (i, y 
are various combinations of 2, 3, 4, show that the class to which (123)(4) 
belongs contains at least four members. Show the same for (124)(3). 

(e) By combining results (b), (c) and (d) deduce that A 4 has exactly four classes, 
and determine the dimensions of its irreps. 

(f) Using the orthogonality properties of characters and noting that elements of 
the form (124)(3) have order 3, find the character table for A 4 . 

Use the results of exercise 24.23 to find the character table for the dihedral group 

X> 5 , the symmetry group of a regular pentagon. 

Demonstrate that equation (25.24) does indeed generate a set of vectors trans- 
forming according to an irrep X, by sketching and superposing drawings of an 

equilateral triangle of springs and masses, based on that shown in figure 25.7. 



Figure 25.7 The three normal vibration modes of the equilateral array. Mode 
(a) is known as the ‘breathing mode". Modes ( b ) and (c) transform according 
to irrep E and have equal vibrational frequencies. 


(a) Make an initial sketch showing an arbitrary small mass displacement from, 
say, vertex C. Draw the results of operating on the initial sketch with each 
of the symmetry elements of the group 3m (C 3 „). 

(b) Superimpose the results, weighting them according to the characters of irrep 
Aj (table 25.1 in section 25.6) and verify that the resultant is a symmetrical 
arrangement in which all three masses move symmetrically towards (or away 
from) the centroid of the triangle. The mode is illustrated in figure 25.7 (a). 
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(c) Start again, now considering a displacement 8 of C parallel to the x-axis. 
Form a similar superposition of sketches weighted according to the characters 
of irrep E (note that the reflections are not needed). The resultant contains 
some bodily displacement of the triangle, since this also transforms according 
to E. Show that the displacement of the centre of mass is x = 8, y = 0. 
Subtract this out and verify that the remainder is of the form shown in 
figure 25.7(c). 

(d) Using an initial displacement parallel to the y-axis, and an analogous proce- 
dure, generate the remaining normal mode, degenerate with that in (c) and 
shown in figure 25.1(b). 

25.13 Further investigation of the crystalline compound considered in exercise 25.7 
shows that the octahedron is not quite perfect but is elongated along the (1, 1, 1) 
direction with the sulphur atoms at positions +(a+<5,<5,<5), +(<5,n+<5,<5), +(<5,<5,a+ 
5), where 8 C a. This structure is invariant under the (crystallographic) symmetry 
group 32 with three two-fold axes along directions typified by (1,— 1,0). The 
latter axes, which are perpendicular to the (1,1,1) direction, are axes of two- 
fold symmetry for the perfect octahedron. The group 32 is really the three- 
dimensional version of the group 3m and has the same character table as table 25.1 
(section 25.6). Use this to show that, when the distortion of the octahedron is 
included, the doublet found in exercise 25.7 is unsplit but the triplet breaks up 
into a singlet and a doublet. 


25.13 Hints and answers 

25.1 There are four classes and hence four one-dimensional irreps, which must have 
entries as follows: 1, 1, 1, 1; 1, 1, —1, —1; 1, —1, 1, —1; 1, —1, —1, 1. The 
characters of D are 2, —2, 0, 0 and so the irreps present are the last two of these. 

25.2 The characters are 4, 0, 0, 0, 2, and the irreps present are A 2 + B 2 + E. The 
characters of the classes are 2, —2, 0, 0, 0, showing that the representation is the 
irrep E. 

25.3 There are five classes {1},{— 1}, {+;'}, {+y},{+k}; there are four one-dimensional 
irreps and one two-dimensional irrep. Show that ab = ba. The homomorphism 
is +1 — > /, +i — > a, +j — > b, +k — > ab. V is Abelian and hence has four 
one-dimensional irreps. 

In the class order given above, the characters for Q are as follows: D (1) , 1, 1, 1,1,1; 
D (2) , 1,1,1, -1,-1; D (3) , 1, 1,-1, 1,-1; D (4) , 1, 1,-1, -1,1; D <5) , 2, -2, 0,0,0. 

25.4 (a) One element of type (1)(2)(3)(4), six of type (12)(3)(4), eight of type (123)(4), 
six of type (1234), three of type (12)(34). Five classes implies five irreps. Since 

nf must equal 24, at least one n, > 3. Assuming n, > 4 leads to a contradiction, 
and so n s (say) equals 3. The inequalities l 2 + 3(2 2 ) < 15 < 4(2 2 ) imply that 
a second n, equals 3. = 6 has only one integer solution, (b) D (2) [(12)] = 

D (2) [( 1234)] = -1. (c) Characters for Tt are (4-1), (2-1), (1-1), (0 -1), (0-1), 
i.e. 3, 1, 0, -1, -1. 

25.6 The matrix representations are R = i[l,—^/3; ^/3, 1] ; m v = [— 1 , 0 ; 0, 1] 

As examples, R 4 = ^[—1,^/3;— ^3,-1] and R 2 m v = ^[1,-^3;— ^3,-1]. 

The representation is faithful. 

25.7 The five basis functions of the representation are multiplied by 1, e , e +ul ‘ , 
g-2‘0 > e +M as a resu )t of the rotation. The character is the sum of these for 
rotations of 0, 2n/3, n, 7t/2, n\ D rep = E + T 2 . 

25.8 (a) No; (b) yes; (c) no; (d) no; (e) yes. 
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25.9 (b) The bodily translation has irrep T 2 and the rotation has irrep Ti. The irreps 
of the internal vibrations are A 2 , E, T 2 , with respective degeneracies 1, 2, 3, 
making six internal coordinates (12 in total minus three translational minus three 
rotational). 

25.10 (a) The identity, eight elements of the form (124)(3) and three elements of the 

form ( 12)(34). 

(b) The assumption implies that there are three irreps, of which one must be the 
identity irrep. However, 1 + n\ + n\ = 12 has no integer solutions. 

(c) Pl = (13)(24), p 2 = (14)(23). 

(d) For example, (123)(4) generates (134)(2), (142)(3), (243 )( 1 ). 

(e) There are three one-dimensional irreps and one three-dimensional irrep. 

(f) The four sets of characters are: 1,1, 1,1; 1, m, a> 2 ,l; l,a> 2 ,co,l; 3,0,0,— 1. 
Here w = exp(2ni/3) and 1 + co + to 2 = 0. 

25.11 There are four classes and hence four irreps, which can only be the identity 
irrep, one other one-dimensional irrep, and two two-dimensional irreps. In the 
class order {/}, {R,R 4 }, {R 2 ,R 3 }, {m,} the second one-dimensional irrep must 
(because of orthogonality) have characters 1, 1, 1, —1. The summation rules and 
orthogonality require the other two character sets to be 2, (— 1 + A5)/2, (— 1 — 
^5)/2,0 and 2, ( — 1 — ^5)/2,(— 1 + ^5)/2,0. Note that R has order 5 and that, 
e.g., (—1 + ^5)/2 = exp(27ii/5) + exp(8m/5). 

25.12 (c) x =l[2d + (-1)(-|<5) + (— 1)H<5)], y = ±[0 + (-!)(- f d) + (-l)(fd)]. 

25.13 The doublet irrep E (characters 2, —1, 0) appears in both 432 and 32 and so 
is unsplit. The triplet T 2 (characters 3, 0, 1) splits under 32 into doublet E 
(characters 2, —1, 0) and singlet Ai (characters 1, 1, 1). 
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26 


Probability 


All scientists will know the importance of experiment and observation and, 
equally, be aware that the results of some experiments depend to a degree on 
chance. For example, in an experiment to measure the heights of a random sample 
of people, we would not be in the least surprised if all the heights were found to 
be different; but, if the experiment were repeated often enough, we would expect 
to find some sort of regularity in the results. Statistics, which is the subject of the 
next chapter, is concerned with the analysis of real experimental data of this sort. 
First, however, we discuss probability. To a pure mathematician, probability is an 
entirely theoretical subject based on axioms. Although this axiomatic approach is 
important, and we discuss it briefly, an approach to probability more in keeping 
with its eventual applications in statistics is adopted here. 

We first discuss the terminology required, with particular reference to the 
convenient graphical representation of experimental results as Venn diagrams. 
The concepts of random variables and distributions of random variables are then 
introduced. It is here that the connection with statistics is made; we assert that 
the results of many experiments are random variables and that those results have 
some sort of regularity, which is represented by a distribution. Precise definitions 
of a random variable and a distribution are then given, as are the defining 
equations for some important distributions. We also derive some useful quantities 
associated with these distributions. 


26.1 Venn diagrams 

We call a single performance of an experiment a trial and each possible result 
an outcome. The sample space S of the experiment is then the set of all possible 
outcomes of an individual trial. For example, if we throw a six-sided die then there 
are six possible outcomes that together form the sample space of the experiment. 
At this stage we are not concerned with how likely a particular outcome might 
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Figure 26.1 A Venn diagram. 


be (we will return to the probability of an outcome in due course) but rather 
will concentrate on the classification of possible outcomes. It is clear that some 
sample spaces are finite (e.g. the outcomes of throwing a die) whilst others are 
infinite (e.g. the outcomes of measuring people’s heights). Most often, one is not 
interested in individual outcomes but in whether an outcome belongs to a given 
subset A (say) of the sample space S ; these subsets are called events. For example, 
we might be interested in whether a person is taller or shorter than 180 cm, in 
which case we divide the sample space into just two events: namely, that the 
outcome (height measured) is (i) greater than 180 cm or (ii) less than 180 cm. 

A common graphical representation of the outcomes of an experiment is the 
Venn diagram. A Venn diagram usually consists of a rectangle, the interior of 
which represents the sample space, together with one or more closed curves inside 
it. The interior of each closed curve then represents an event. Figure 26.1 shows 
a typical Venn diagram representing a sample space S and two events A and 
B. Every possible outcome is assigned to an appropriate region; in this example 
there are four regions to consider (marked i to iv in figure 26.1): 

(i) outcomes that belong to event A but not to event B ; 

(ii) outcomes that belong to event B but not to event A; 

(iii) outcomes that belong to both event A and event B ; 

(iv) outcomes that belong to neither event A nor event B. 


►A six-sided die is thrown. Let event A he ‘the number obtained is divisible by 2’ and event 
B be ‘the number obtained is divisible by 3'. Draw a Venn diagram to represent these events. 


It is clear that the outcomes 2, 4, 6 belong to event A and that the outcomes 3, 6 belong 
to event B. Of these, 6 belongs to both A and B. The remaining outcomes, 1, 5, belong to 
neither A nor B. The appropriate Venn diagram is shown in figure 26.2. ◄ 

In the above example, one outcome, 6, is divisible by both 2 and 3 and so 
belongs to both A and B. This outcome is placed in region iii of figure 26.1, which 
is called the intersection of A and B and is denoted by A n B (see figure 26.3(a)). 
If no events lie in the region of intersection then A and B are said to be mutually 
exclusive or disjoint. In this case, often the Venn diagram is drawn so that the 
closed curves representing the events A and B do not overlap, so as to make 
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Figure 26.2 The Venn diagram for the outcomes of the die-throwing trials 
described in the worked example. 




Figure 26.3 Venn diagrams: the shaded regions show (a) AnB, the inter- 
section of two events A and B, (b) Au B, the union of events A and B, (c) 
the complement A of an event A, (d) A — B, those outcomes in A that do not 
belong to B. 


graphically explicit the fact that A and B are disjoint. It is not necessary, however, 
to draw the diagram in this way, since we may simply assign zero outcomes to 
the shaded region in figure 26.3(a). An event that contains no outcomes is called 
the empty event and denoted by 0. The event comprising all the elements that 
belong to either A or B , or to both, is called the union of A and B and is denoted 
by A U B (see figure 26.3(h)). In the previous example, A U B = {2, 3, 4, 6}. 
It is sometimes convenient to talk about those outcomes that do not belong to 
a particular event. The set of outcomes that do not belong to A is called the 
complement of A and is denoted by A (see figure 26.3(c)); this can also be written 
as A = S — A. It is clear that Au A = S and A n A = 0. 

The above notation can be extended in an obvious way, so that A — B denotes 
the outcomes in A that do not belong to B. It is clear from figure 26.3(d) that 
A — B can also be written as A n B. Finally, when all the outcomes in event B 
(say) also belong to event A , but A may contain, in addition, outcomes that do 
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Figure 26.4 The general Venn diagram for three events is divided into eight 
regions. 


not belong to B , then B is called a subset of A, a situation that is denoted by 
B c A; alternatively, one may write A => B, which states that A contains B. In this 
case, the closed curve representing the event B is often drawn lying completely 
within the closed curve representing the event A. 

The operations U and n are extended straightforwardly to more than two 
events. If there exist n events Ai,A 2 ,...,A„, in some sample space S, then the 
event consisting of all those outcomes that belong to one or more of the A, is the 
union of A\,A 2 , . . . ,A n and is denoted by 

A 1 UA 2 U---UA„. (26.1) 

Similarly, the event consisting of all the outcomes that belong to every one of the 
Aj is called the intersection of A\,A 2 ,...,A n and is denoted by 

A 1 nA 2 n---nA n . (26.2) 

If, for any pair of values i, j with i ^ j, 

Aj n Aj = 0 (26.3) 

then the events Aj and Aj are said to be mutually exclusive or disjoint. 

Consider three events A, B and C with a Venn diagram such as is shown in 
figure 26.4. It will be clear that, in general, the diagram will be divided into eight 
regions and they will be of four different types. Three regions correspond to a 
single event; three regions are each the intersection of exactly two events; one 
region is the three-fold intersection of all three events; and finally one region 
corresponds to none of the events. Let us now consider the numbers of different 
regions in a general n-event Venn diagram. 

For one-event Venn diagrams there are two regions, for the two-event case 
there are four regions and, as we have just seen, for the three-event case there are 
eight. In the general n-event case there are 2" regions, as is clear from the fact 
that any particular region R lies either inside or outside the closed curve of any 
particular event. With two choices (inside or outside) for each of n closed curves, 
there are 2" different possible combinations with which to characterise R. Once n 
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gets beyond three it becomes impossible to draw a simple two-dimensional Venn 
diagram, but this does not change the results. 

The 2" regions will break down into n + 1 types, with the numbers of each type 
as follows! 


no events. 

"Co = 

i; 


one event but no intersections, 

"Ci = 

n; 


two-fold intersections, 

"C 2 = 

\n(n — 

l); 

three-fold intersections, 

"C 3 = 

4jn(n - 

- 1 )(« 

an /j-fold intersection, 

nr' 

1. 



That this makes a total of 2" can be checked by considering the binomial 
expansion 

2" = (1 + 1)" = 1 + n + \n(n - 1) + • • • + 1. 


Using Venn diagrams, it is straightforward to show that the operations D and 
U obey the following algebraic laws: 


commutativity, 

associativity, 

distributivity, 

idempotency, 


An B = B n A, AUB = BUA; 

{AnB)nC = An{BnC), (dUB)UC =du(BuC); 
A n (B u C) = (A n B) u (A n C), 

A U (B n C) = (A U B) n (A U C); 

AnA = A, A U A — A. 


► S/iow that (i) AU (A n B) = An (AU B) = A, (ii ) (A — B) U (A n B) = A. 

(i) Using the distributivity and idempotency laws above, we see that 

A U (A D B) = (A U A) n (A U B) = A n (A U B). 

By sketching a Venn diagram it is immediately clear that both expressions are equal to 
A. Nevertheless, we here proceed in a more formal manner in order to deduce this result 
algebraically. Let us begin by writing 

X = A U (A n B) = A n (A U B), (26.4) 

from which we want to deduce a simpler expression for the event X. Using the first equality 
in (26.4) and the algebraic laws for n and U, we may write 

Anx = An [Au (An B)] 

= (An A) u [An(An B)] 

= AU(AnB) = X. 


f The symbols "Cj, for i = 0,1,2 are a convenient notation for combinations; they and their 
properties are discussed in chapter 1. 
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Since ini = X we must have X c= A. Now, using the second equality in (26.4) in a 
similar way, we find 

4Ul = 4U[4n(iUB)] 

= (iui)n [4u(iuB)] 

= 4n(iuB) = l, 

from which we deduce that A c X. Thus, since X c A and icl,we must conclude that 
X = A. 

(ii) Since we do not know how to deal with compound expressions containing a minus 
sign, we begin by writing A — B = An B as mentioned above. Then, using the distributivity 
law, we obtain 

(A-B)U(AnB) = (An B)U (A n B) 

= An (BUB) 

= AnS = A. 

In fact, this result, like the first one, can be proved trivially by drawing a Venn diagram. ◄ 

Further useful results may be derived from Verm diagrams. In particular, it is 
simple to show that the following rules hold: 

(i) if A cz B then A => B \ 

(ii) AU B =inB; 

(iii) AnB = A U B. 

Statements (ii) and (iii) are known jointly as de Morgan’s laws and are sometimes 
useful in simplifying logical expressions. 


► There exist two events A and B such that 

(IUA)U(IU4) = B. 

Find an expression for the event X in terms of A and B. 

We begin by taking the complement of both sides of the above expression : applying de 
Morgan’s laws we obtain 

B = (luA)n(lUA). 

We may then use the algebraic laws obeyed by n and U to yield 

B = XU(AnA) = XU 0 = X. 

Thus, we find that X = B. ◄ 


26.2 Probability 

In the previous section we discussed Venn diagrams, which are graphical repre- 
sentations of the possible outcomes of experiments. We did not, however, give 
any indication of how likely each outcome or event might be when any particular 
experiment is performed. Most experiments show some regularity. By this we 
mean that the relative frequency of an event is approximately the same on each 
occasion that a set of trials is performed. For example, if we throw a die N 
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times then we expect that a six will occur approximately N/6 times (assuming, 
of course, that the die is not biased). The regularity of outcomes allows us to 
define the probability, Pr(^4), as the expected relative frequency of event A in a 
large number of trials. More quantitatively, if an experiment has a total of ns 
outcomes in the sample space S, and ha of these outcomes correspond to the 
event A, then the probability that event A will occur is 

Pr(/1) = — . (26.5) 

n s 


26.2.1 Basic theorems 

From (26.5) we may deduce the following properties of the probability Pr(yf). 

(i) For any event A in a sample space S , 

0 < Pr(4) < 1. (26.6) 

If Pr(^) = 1 then A is a certainty; if Pr(^) = 0 then A is an impossibility. 

(ii) For the entire sample space S we have 

Pr(S) = — = 1, (26.7) 

n s 

which simply states that we are certain to obtain one of the possible 
outcomes. 

(iii) If A and B are two events in S then, from the Venn diagrams in figure 26.3, 
we see that 

n AuB = n A + n B - n AnB , (26.8) 

the final subtraction arising because the outcomes in the intersection of 
A and B are counted twice when the outcomes of A are added to those 
of B. Dividing both sides of (26.8) by ns, we obtain the addition rule for 
probabilities 

Prfi4 U B) = Pr(A) + Pr(B) - Pr(H n B). (26.9) 

However, if A and B are mutually exclusive events (A C\ B =0) then 
Prfi4 nfi) = 0 and we obtain the special case 

Pr(A U B) = Prfi4) + Pr(B). (26.10) 

(iv) If A is the complement of A then A and A are mutually exclusive events. 
Thus, from (26.7) and (26.10) we have 

1 = p r (S) = Pr(T u A) = Pr(T) + Pr(T), 

from which we obtain the complement law 

Pr(T) = 1 — Pr(T). (26.11) 
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This is particularly useful for problems in which evaluating the probability 
of the complement is easier than evaluating the probability of the event 
itself. 


► Calculate the probability of drawing an ace or a spade from a pack of cards. 

Let A be the event that an ace is drawn and B the event that a spade is drawn. It 
immediately follows that Pr(T) = ^ = p, and Pr(B) = y = The intersection of A and 
B consists of only the ace of spades and so Pr (A OB) = T. Thus, from (26.9) 

Pr(AuB)=± + ±-± = A 

In this case it is just as simple to recognise that there are 16 cards in the pack that satisfy 
the required condition (13 spades plus three other aces) and so the probability is ||. ◄ 

The above theorems can easily be extended to a greater number of events. For 
example, if A\,A 2 ,...,A n are mutually exclusive events then (26.10) becomes 

Pr(/l| u A 2 U • • • U A n ) = Pr(^4i) + Pr)^) + • • • + Pr(^4„). 

(26.12) 

Furthermore, if A l ,A 2 ,...,A n (whether mutually exclusive or not) exhaust S, i.e. 
are such that ^4i U A 2 U • • • U A„ = S, then 

Pr(Ai LM 2 U • • • LM„) = Pr(S) = 1. (26.13) 


►T biased six-sided die has probabilities \p. p, p. p. p. 2 p of showing 1, 2, 3, 4, 5, 6 
respectively. Calculate p. 


Given that the individual events are mutually exclusive, (26.12) can be applied to give 

Pr(lu2u3u4u5u6) = ^p + p + p + p + p + 2p = y p. 

The union of all possible outcomes on the LHS of this equation is clearly the sample 
space, S , and so 

Pr(S) = f p. 

Now using (26.7), 

yP = Pr(S) = 1 => p=&.< 

When the possible outcomes of a trial correspond to more than two events, 
and those events are not mutually exclusive, the calculation of the probability of 
the union of a number of events is more complicated, and the generalisation of 
the addition law (26.9) requires further work. Let us begin by considering the 
union of three events A\, As and Ay, which need not be mutually exclusive. We 
first define the event B = A 2 U Ay and, using the addition law (26.9), we obtain 

PrUi UA 2 UA 3 ) = Pr(T] U B) = Pr(^i) + Pr(B) - Pr(Ai n B). 

(26.14) 
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However, we may write Prfi4i n B ) as 

Pr(/T| n B) = Pr[/4i n (A 2 Ui 3 )] 

= Pr[(H 1 nH 2 )U(H 1 nH 3 )] 

= Pr(Hi nl 2 ) + Pr(Hi ni 3 )- Pr(Hi ni 2 DH 3 ). 

Substituting this expression, and that for Pr(B) obtained from (26.9), into (26.14) 
we obtain the probability addition law for three general events, 

Pr (Hi U A 2 U A 3 ) = Pr(Ai) + Pr(H 2 ) + Pr(H 3 ) - Pr(H 2 n A 3 ) - Pr(Hi n A 3 ) 

- Pr(Hi n A 2 ) + Pr(y4i fld 2 n A 3 ). (26.15) 


► Calculate the probability of drawing from a pack of cards one that is an ace or is a spade 
or shows an even number (2, 4, 6, 8, 10). 


If, as previously, A is the event that an ace is drawn, Pr(T) = Similarly the event B, 
that a spade is drawn, has Pr(B) = 1|. The further possibility C, that the card is even (but 
not a picture card) has Pr(C) = The two-fold intersections have probabilities 

Pr(dnB)=^, Pr(dnC) = 0, Pr(BnC)=^. 

There is no three-fold intersection as events A and C are mutually exclusive. Hence 
Pr(dUBUC)= ^[(4 + 13 + 20)-(l+0 + 5) + (0)] = 

The reader should identify the 31 cards involved. ◄ 

When the probabilities are combined to calculate the probability for the union 
of the n general events, the result, which may be proved by induction upon n (see 
the answer to exercise 26.4), is 

PrM, u A 2 U • • • U A n ) = - J2 Pr(/4i n Pr(y4i n A J n Ak) 

i i,i i,j,k 

+ (-l)" +1 Pr(Hi n a 2 n • • • n A„). (26.16) 

Each summation runs over all possible sets of subscripts, except those in which 
any two subscripts in a set are the same. The number of terms in the summation 
of probabilities of m-fold intersections of the n events is given by n C m (as discussed 
in section 26.1). Equation (26.9) is a special case of (26.16) in which n = 2 and 
only the first two terms on the RHS survive. We now illustrate this result with a 
worked example that has n — 4 and includes a four-fold intersection. 


969 




PROBABILITY 


►Fmrf the probability of drawing from a pack a card that has at least one of the following 
properties: 

A, it is an ace: 

B, it is a spade: 

C, it is a black honour card ( ace, king, queen, jack or 10): 

D, it is a black ace. 

Measuring all probabilities in units of the single-event probabilities are 
Pr(A) = 4, Pr(B) = 13, Pr(C) = 10, Pr(L>) = 2. 

The two-fold intersection probabilities, measured in the same units, are 

Pv(A n B) = 1, Pr(ri n C) = 2, Pr(rinri) = 2, 

Pr( B n C ) = 5, Pr(B n D) = 1, Pr(CnB) = 2. 

The three-fold intersections have probabilities 

PrUnBnC) = l, Pr(A n B n D) = 1, Pr(dncnfl) = 2, Pr(BnCnri) = 1. 

Finally, the four-fold intersection, requiring all four conditions to hold, is satisfied only by 
the ace of spades, and hence (again in units of j^) 

Pr(Ans n c nD) = l. 

Substituting in (26.16) gives 

p = [(4 + 13 + 10 + 2) — (1 + 2 + 2 + 5 + 1 + 2) + (1 + 1 + 2 + 1) — (1)] = ◄ 

We conclude this section on basic theorems by deriving a useful general 
expression for the probability Pr (A n B) that two events A and B both occur in 
the case where A (say) is the union of a set of n mutually exclusive events A t . In 
this case 

A n b = {A x n B) u • • • u (A„ n B), 

where the events Aj n B are also mutually exclusive. Thus, from the addition law 
(26.12) for mutually exclusive events, we find 

Pr(ri nfi) = ^ Pr(ri, n B). (26.17) 

i 

Moreover, in the special case where the events A t exhaust the sample space S, we 
have An B = S <1 B = B, and we obtain the total probability law 

Pr (B) = p ^ A i nB ^- (26.18) 


26.2.2 Conditional probability 

So far we have defined only probabilities of the form ‘what is the probability that 
event A happens?’. In this section we turn to conditional probability, the probability 
that a particular event occurs given the occurrence of another, possibly related, 
event. For example, we may wish to know the probability of event B, drawing an 
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ace from a pack of cards from which one has already been removed, given that 
event A, the card already removed was itself an ace, has occurred. 

We denote this probability by Pr(B|ff) and may obtain a formula for it by 
considering the total probability Pr(ff n B) = Pr(B r\A) that both A and B will 
occur. This may be written in two ways, i.e. 


Pr(T n B) = Pr(A)Pr(B\A) 

= Pr(B)Pr(yf|B). 

(26.19) 

(26.20) 

In terms of Venn diagrams, we may think of Pr{B\A) as the probability of B in 
the reduced sample space defined by A. Thus, if two events A and B are mutually 
exclusive then 


From this we obtain 


and 


Pr(A\B) = 


Pr (AnB) 
Pr (B) 


Pr{B\A) = 


Pr(BnA) 
Pr(ff) ' 


Pr(A\B) = 0 = Pr(5|A). (26.21) 

When an experiment consists of drawing objects at random from a given set 
of objects, it is termed sampling a population. We need to distinguish between 
two different ways in which such a sampling experiment may be performed. After 
an object has been drawn at random from the set it may either be put aside 
or returned to the set before the next object is randomly drawn. The former is 
termed ‘sampling without replacement’, the latter ‘sampling with replacement’. 


>-Find the probability of drawing two aces at random from a pack of cards ( i ) when the 
first card drawn is replaced at random into the pack before the second card is drawn, and 
( ii ) when the first card is put aside after being drawn. 


Let A be the event that the first card is an ace, and B the event that the second card is an 
ace. Now 

Pr(TnB) = Pr(T)Pr(B|T), 

and for both (i) and (ii) we know that Pr(ff) = 55 = yj- 

(i) If the first card is replaced in the pack before the next is drawn then Pr(B|T) = 
Pr(B) = A = jj, since A and B are independent events. We then have 

Pr(TnB) = PrW)Pr(B)= 1 x ^ 

(ii) If the first card is put aside and the second then drawn, A and B are not independent 

and Pr(B\A) = , with the result that 

Pr (A tlB) = Pr(A)PTlBIA) = 1 x 1 = ± ◄ 
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Two events A and B are statistically independent if Pr(A|_B) = Pr(A) (or equiva- 
lently if Pr(B|,4) = Pr( B )). In words, the probability of A given B is then the same 
as the probability of A regardless of whether B occurs. For example, if we throw 
a coin and a die at the same time, we would normally expect that the probability 
of throwing a six was independent of whether a head was thrown. If A and B are 
statistically independent then it follows that 

PfAnB) = PrU)Pr(B). (26.22) 

In fact, on the basis of intuition and experience, (26.22) may be regarded as the 
definition of the statistical independence of two events. 

The idea of statistical independence is easily extended to an arbitrary number 
of events A\,Ai,...,A n . The events are said to be (mutually) independent if 

PrM, n A-) = Px(Aj)Bx(Aj), 

Pr(A,- n Aj n A k ) = PrbTlPrid,) Pr (A k ), 


PrM, n A 2 n • • • n A n ) = Pr(Ai) Pr(^ 2 ) • • • Pr (A n ), 

for all combinations of indices j and k for which no two indices are the same. 
Even if all n events are not mutually independent, any two events for which 
Pr(y4,- nAj) = Prb4,-) Pr(^ ; ) are said to be pairwise independent. 

We now derive two results that often prove useful when working with condi- 
tional probabilities. Let us suppose that an event A is the union of n mutually 
exclusive events A t . If B is some other event then from (26.17) we have 

Pr(AnB) = ^Pr(y4,n5). 

i 

Dividing both sides of this equation by Pr(B), and using (26.19), we obtain 

Prh4|B) = Pr(Ail-B), (26.23) 

i 

which is the addition law for conditional probabilities. 

Furthermore, if the set of mutually exclusive events A ,■ exhausts the sample 
space S then, from the total probability law (26.18), the probability Pr(B) of some 
event B in S can be written as 

Pr (B) = Pr (A) Pr (B\Ai). (26.24) 


►T collection of traffic islands connected by a system of one-way roads is shown in fig- 
ure 26.5. At any given island a car driver chooses a direction at random from those available. 
What is the probability that a driver starting at 0 will arrive at B? 

In order to leave 0 the driver must pass through one of A\, A 2 , A 3 or A 4 , which thus 
form a complete set of mutually exclusive events. Since at each island (including O) the 
driver chooses a direction at random from those available, we have that Pr(T,) = I for 
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Figure 26.5 A collection of traffic islands connected by one-way roads. 


i = 1,2, 3,4. From figure 26.5, we see also that 

Pr(B|A 1 )=i, Pr(B|A,)=i, Pr(B|A 3 ) = 0, Pr(B|A 4 ) = \ = \. 

Thus, using the total probability law (26.24), we find that the probability of arriving at B 
is given by 

Pr(B) = ^ Pr(A,)Pr(B|A,) = ± (f + f + 0 + i) = 1 ◄ 

i 

Finally, we note that the concept of conditional probability may be straightfor- 
wardly extended to several compound events. For example, in the case of three 
events A, B , C, we may write Pr(^4 n B n C) in several ways, e.g. 

Pr (A n B n C) = Pr(C) Pr (A D B\C ) 

= Pr(B n C) Pr(^4|B n C) 

= Pr(C)Pr(BlC)Pr(AlB n C). 


► Suppose {A t } is a set of mutually exclusive events that exhausts the sample space S. If B 
and C are two other events in S, show that 

Pr(B|C) = 5^Pr(Ai|C)Pr(B|AinC). 


Using (26.19) and (26.17), we may write 

Pr(C)Pr(B|C) = Pr(BnC) = ^Pr|4,nBnC). (26.25) 

i 

Each term in the sum on the RHS can be expanded as an appropriate product of 
conditional probabilities, 

PrfAnBnC) = Pr(C)Pr(A,|C)Pr(B|A,nC). 

Substituting this form into (26.25) and dividing through by Pr(C) gives the required 
result. ◄ 
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26.2.3 Bayes’ theorem 

In the previous section we saw that the probability that both an event A and a 
related event B will occur can be written either as Pr(A) Pr(B\A) or Pr(B) Pr(H|B). 
Hence 


Pr{A)Pr(B\A) = Pr(£)Pr(H|B), 
from which we obtain Bayes’ theorem, 

PrU|B) = S(^) Pr(BM) - (26 ' 26) 

This theorem clearly shows that Pr(B|H) =f= Pr(H|B), unless Pr(H) = Pr(B). It is 
sometimes useful to rewrite Pr(B), if it is not known directly, as 

Pr(B) = Pr(H) Pr(B|H) + Pr(I) Pr(B|I) 


so that Bayes’ theorem becomes 


Pr(H|B) = 


Pr(H)Pr(£|H) 

Pv(A)Pr(B\A) +Pr( A) Pr(B\A) 


(26.27) 


► Suppose that the blood test for some disease is reliable in the following sense: for people 
who are in fected with the disease the test produces a positive result in 99.99% of cases; for 
people not infected a positive test result is obtained in only 0.02% of cases. Furthermore, 
assume that in the general population one person in 10000 people is infected. A person is 
selected at random and found to test positive for the disease. What is the probability that 
the individual is actually infected? 


Let A be the event that the individual is infected and B be the event that the individual 
tests positive for the disease. Using Bayes’ theorem the probability that a person who tests 
positive is actually infected is 


Pr(A\B) 


Pr(T)Pr(B|T) 

Pr(T)Pr(B|T) +Pr(A)Pr(B\A) 


Now Prfi4) = 1/10000 = 1 — Pr(T), and we are told that Pv(B\A) = 9999/10000 and 
Pr(B|T) = 2/10000. Thus we obtain 

1/10000 x 9999/10000 _1 

V(A ' B) - (1/10000 x 9999/10000) + (9999/10000 x 2/10000) " 3' 

Thus, there is only a one in three chance that a person chosen at random, who tests 
positive for the disease, is actually infected. 

At a first glance, this answer may seem a little surprising, but the reason for the counter- 
intuitive result is that the probability that a randomly selected person is not infected is 
9999/10000, which is very high. Thus, the 0.02% chance of a match for an uninfected 
person becomes significant. ◄ 


We note that (26.27) may be written in a more general form if S is not simply 
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divided into A and A but, rather, into any set of mutually exclusive events A t that 
exhaust S. Using the total probability law (26.24), we may then write 

Pr(B) = ^Pr(4)Pr(B|4), 


so that Bayes’ theorem takes the form 


Pr(A\B) = 


Pr(A)Pr(B\A) 

EWW’ 


(26.28) 


where the event A need not coincide with any of the A,-. 

As a final point, we comment that sometimes we are concerned only with the 
relative probabilities of two events A and C (say), given the occurrence of some 
other event B. From (26.26) we then obtain a different form of Bayes’ theorem. 


Pv(A\B) _ Pr(A)Pr(B\A) 
Pr(C|B) ~ Pr(C)Pr(B|C)’ 


(26.29) 


which does not contain Pr(B) at all. 


26.3 Permutations and combinations 


In equation (26.5) we defined the probability of an event A in a sample space S 
as 


Pr(T) = 


Ha 

n s ’ 


where ha is the number of outcomes belonging to event A and ns is the total 
number of possible outcomes. It is therefore necessary to be able to count the 
number of possible outcomes in various common situations. 


26.3.1 Permutations 

Let us first consider a set of n objects that are all different. We may ask in 
how many ways these n objects may be arranged, i.e. how many permutations of 
these objects exist. This is straightforward to deduce, as follows: the object in the 
first position may be chosen in n different ways, that in the second position in 
n — 1 ways, and so on until the final object is positioned. The number of possible 
arrangements is therefore 

n(n — l)(n — 2) ■ ■ ■ (1) = n! (26.30) 

Generalising (26.30) slightly, let us suppose we choose only k (< n) objects 
from n. The number of possible permutations of these k objects selected from n 
is given by 

n(n - 1 )(n - 2) • • • (n - k + 1) = = n P k . (26.31) 

' v (n — k) ! 

k factors 
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In calculating the number of permutations of the various objects we have so 
far assumed that the objects are sampled without replacement - i.e. once an object 
has been drawn from the set it is put aside. As mentioned previously, however, 
we may instead replace each object before the next is chosen. The number of 
permutations of k objects from n with replacement may be calculated very easily 
since the first object can be chosen in n different ways, as can the second, the 
third, etc. Therefore the number of permutations is simply n k . This may also be 
viewed as the number of permutations of k objects from n where repetitions are 
allowed, i.e. each object may be used as often as one likes. 


►Find the probability that in a group of k people, at least two have the same birthday 
( ignoring 29 February ). 


It is simplest to begin by calculating the probability that no two people share a birthday, 
as follows. Firstly, we imagine each of the k people in turn pointing to their birthday on 
a year planner. Thus, we are sampling the 365 days of the year ‘with replacement' and 
so the total number of possible outcomes is (365) t . Now, (for the moment) we assume 
that no two people share a birthday and imagine the process being repeated, but as each 
person points out their birthday it is crossed off the planner. In this case, we are sampling 
the days of the year ‘without replacement’, and so the possible number of outcomes for 
which all the birthdays are different is 


365 P k 


365! 

(365 — k)V 


Hence the probability that all the birthdays are different is 


365! 

P ~ (365 -k)! 365* ' 

Now using the complement rule (26.11), the probability q that two or more people have 
the same birthday is simply 


q = 1 — p = 1 — 


365! 

(365 -k)! 365* ' 


This expression may be conveniently evalutated using Stirling's approximation for n ! when 
n is large, namely 



to give 


q 


1 — e~ k 


365 

365 - k 


365 — fc+0.5 


It is interesting to note that if k = 23 the probability is a little greater than a half that 
at least two people have the same birthday, and if k = 50 the probability rises to 0.970. 
This can prove a good bet at a party of non-mathematicians ! ◄ 


So far we have assumed that all n objects are different (or distinguishable). Let 
us now consider n objects of which ti\ are identical and of type 1, n 2 are identical 

and of type 2, . . . , n m are identical and of type m (clearly n = n\ + n 2 H + n m ). 

From (26.30) the number of permutations of these n objects is again n !. However, 
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the number of distinguishable permutations is only 


n ! 

Ml ! m 2 ! • • • n m ! ’ 


(26.32) 


since the ith group of identical objects can be rearranged in n, ! ways without 
changing the distinguishable permutation. 


►A set of snooker balls consists of a white, a yellow, a green, a brown, a blue, a pink, a 
black and 15 reds. How many distinguishable permutations of the balls are there? 


In total there are 22 balls, the 15 reds being indistinguishable. Thus from (26.32) the 
number of distinguishable permutations is 


22 ! 

( 1 ! )( 1 !)(1 !)(1 !)( 1 ! )( 1 ! )(15 ! ) 


19! 

— = 859 541760. 


◄ 


26.3.2 Combinations 

We now consider the number of combinations of various objects when their order 
is immaterial. Assuming all the objects to be distinguishable, from (26.31) we see 
that the number of permutations of k objects chosen from n is n Pi ( = n\/(n — k)\. 
Now, since we are no longer concerned with the order of the chosen objects, which 
can be internally arranged in k ! different ways, the number of combinations of k 
objects from n is 

(^)ST=" C ‘=(t) (26.33) 

where, as noted in chapter 1 , "C k is called the binomial coefficient since it also 
appears in the binomial expansion for positive integer n, namely 

n 

(a + b) n = J2 n C k a k b n - k . (26.34) 

k=0 


►A hand of 13 playing cards is dealt from a well-shuffled deck of 52. What is the probability 
that the hand contains two aces? 


Since the order of the cards in the hand is immaterial, the total number of distinct hands 
is simply equal to the number of combinations of 13 objects drawn from 52, i.e. 52 Ci 3 . 
However, the number of hands containing two aces is equal to the number of ways, 4 C 2 , 
in which the two aces can be drawn from the four available, multiplied by the number of 
ways, 48 Cu, in which the remaining 11 cards in the hand can be drawn from the 48 cards 
that are not aces. Thus the required probability is given by 

4 C 2 48 Cu _ 4! 48! 13!39! 

52 Ci 3 " 2!2!11!37! 52! 

= (3)(4)(12)(13)(38)(39) _ 

2 (49)(50)(51)(52) ' 
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Another useful result that may be derived using the binomial coefficients is the 
number of ways in which n distinguishable objects can be divided into m piles, 
with objects in the ith pile, i = 1,2 (the ordering of objects within each 
pile being unimportant). This may be straightforwardly calculated as follows. We 
may choose the n\ objects in the first pile from the original n objects in n C m ways. 
The n 2 objects in the second pile can then be chosen from the n — m remaining 
objects in "~ ni C„ 2 ways, etc. We may continue in this fashion until we reach the 
(m — l)th pile, which may be formed in "’”" 1 nm ~ 2 C nm _ 1 ways. The remaining 
objects then form the mth pile and so can only be ‘chosen’ in one way. Thus the 
total number of ways of dividing the original n objects into m piles is given by 
the product 


ty n z-' n—nif' n—n\ n m - if' 

^ '- / «i '-'W2 * * * 1 

n ! (n — «i)! 

n 1 !(n — Mi)! n 2 K n ~ n i ~ n i ) ■ 
n! (n — «i)! 

n 1 \(n — ni)! n 2 \(n — ny — r^)! 
n ! 

ni\n 2 \ ■ ■ ■ n m V 


(n — m — n 2 n m - 2 )! 

n m - 1 !(n - ni - n 2 n m _ 2 - n m _i)! 

(n-Wl — M 2 »m-2)! 

il m —\ 1 


(26.35) 


These numbers are called multinomial coefficients since (26.35) is the coefficient of 
Xj'x" 2 • • • x"“ in the multinomial expansion of (xi +X 2 + • • • +x m ) n , i.e. for positive 
integer n 


(Xi + X2 + • • • + X m )" — 


E 

b”m = 


n\ \ri 2 ! • • • n m ! 


x 


n m 
m * 


For the case m = 2, = k, n 2 = n — k, (26.35) reduces to the binomial coefficient 

" C/< . Furthermore, we note that the multinomial coefficient (26.35) is identical to 
the expression (26.32) for the number of distinguishable permutations of n objects, 
n,- of which are identical and of type i (for i — 1 , 2 ,... ,m and n\ +n 2 + - ■ ■ + n m = n). 
A few moments’ thought should convince the reader that the two expressions 
(26.35) and (26.32) must be identical. 


►/;? the card game of bridge, each of four players is dealt 13 cards from a full pack of 52. 
What is the probability that each player is dealt an ace? 


From (26.35), the total number of distinct bridge dealings is 52!/(13!13!13!13!). However, 
the number of ways in which the four aces can be distributed with one in each hand is 
4!/(l!l!l!l!) = 4!; the remaining 48 cards can then be dealt out in 48!/(12!12!12!12!) 
ways. Thus the probability that each player receives an ace is 


, 48! ( 1 3 ! ) 4 _ 24(13) 

‘ (12!) 4 52! “ (49)(50)( 51 )(52) 


As in the case of permutations we might ask how many combinations of k 
objects can be chosen from n with replacement (repetition). To calculate this, we 
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may imagine the n (distinguishable) objects set out on a table. Each combination 
of k objects can then be made by pointing to k of the n objects in turn (with 
repetitions allowed). These k equivalent selections distributed amongst n different 
but re-choosable objects are strictly analogous to the placing of k indistinguishable 
‘balls’ in n different boxes with no restriction on the number of balls in each box. 
A particular selection in the case k = 1, n = 5 may be symbolised as 

xxx| |x|xx|x. 


This denotes three balls in the first box, none in the second, one in the third, 
two in the fourth and one in the fifth. We therefore need only to consider the 
number of (distinguishable) ways in which k crosses and n— 1 vertical lines can 
be arranged, i.e. the number of permutations of k + n — 1 objects of which k are 
identical crosses and n — 1 are identical lines. This is given by (26.33) as 


(k + n — 1 ) ! _ n+ k-i r 

k\(n — 1)! k 


(26.36) 


We note that this expression also occurs in the binomial expansion for negative 
integer powers. If n is a positive integer, it is straightforward to show that (see 
chapter 1) 

00 

(a + b)~ n = ^(-1)* n+k ~ x C k a~ n ~ k b k , 
k = 0 


where a is taken to be larger than b. 


►T system contains a number N of ( non-interacting ) particles, each of which can be in 
any of the quantum states of the system. The structure of the set of quantum states is such 
that there exist R energy levels with corresponding energies £,; and degeneracies g, ( i.e. the 
ith energy level contains gj quantum states). Find the numbers of distinct ways in which 
the particles can be distributed among the quantum states of the system such that the ith 
energy level contains n t particles, for i = 1,2,..., R, in the cases where the particles are 

(i) distinguishable with no restriction on the number in each state; 

(ii) indistinguishable with no restriction on the number in each state; 

(Hi) indistinguishable with a maximum of one particle in each state; 

(iv) distinguishable with a maximum of one particle in each state. 


It is easiest to solve this problem in two stages. Let us first consider distributing the N 
particles among the R energy levels, without regard for the individual degenerate quantum 
states that comprise each level. If the particles are distinguishable then the number of 
distinct arrangements with u, particles in the ith level, i = 1,2,..., R, is given by (26.35) as 

N1 

111 !«2 ! • • ' Hr ! 

If, however, the particles are indistinguishable then clearly there exists only one distinct 
arrangement having n, particles in the ith level, i = 1,2 ,...,R . Now let us suppose there 
exist w, ways in which the a, particles in the ith energy level can be distributed among 
the gj degenerate states. Thus it follows that the number of distinct ways in which the N 
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particles can be distributed among all R quantum states of the system, with particles in 
the ith level, is given by 


W{n,j 


N\ 


n\ \ri2 ! • ■ ■ hr ! 
R 


n vv ' 


R 


n vv < 


for distinguishable particles, 


for indistinguishable particles. 


(26.37) 


It therefore remains only for us to find the appropriate expression for w, in each of the 
cases (i)-(iv) above. 

Case (i). If there is no restriction on the number of particles in each quantum state, 
then in the ith energy level each particle can reside in any of the g, degenerate quantum 
states. Thus, if the particles are distinguishable then the number of distinct arrangements 
is simply w, = g"'. Thus, from (26.37), 


W{n,} 


N\ 


f?i b? 2 ! ■ ■ - n R ! 


11^ V-Ilf, 


Such a system of particles (for example atoms or molecules in a classical gas) is said to 
obey Maxwell-Boltzmann statistics. 

Case (ii). If the particles are indistinguishable and there is no restriction on the number 
in each state then, from (26.36), the number of distinct arrangements of the particles 
among the g, states in the ith energy level is 

(«, + g, — 1) ! 

W; = . 

' ",!(&- 1 )! 

Substituting this expression in (26.37), we obtain 


R 

W{n,} = n 
(=1 


pi,- + g, - 1)! 
Ji,!(g,-1)! 


Such a system of particles (for example a gas of photons) is said to obey Bose-Einstein 
statistics. 

Case (iii). If a maximum of one particle can reside in each of the g,- degenerate quantum 
states in the ith energy level then the number of particles in each state is either 0 or 1. 
Since the particles are indistinguishable, w, is equal to the number of distinct arrangements 
in which n t states are occupied and g, — u, states are unoccupied; this is given by 


Thus, from (26.37), we have 


w,- = Sl C„. = 


rii!(gi -nt)!' 


" X = n 


n i !(gi —• «;) ! 


Such a system is said to obey Fermi-Dirac statistics, and an example is provided by an 
electron gas. 

Case (iv). Again, the number of particles in each state is either 0 or 1. If the particles 
are distinguishable, however, each arrangement identified in case (iii) can be reordered in 
rii ! different ways, so that 


w, : = Si P, 


g.! 

(gi — n i) ! 
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Substituting this expression into (26.37) gives 


R 

w{m } = am n 

;=i 


g»! 

n,!(g,-u,)!' 


Such a system of particles has the names of no famous scientists attached to it, since it 
appears that it never occurs in nature. ◄ 


26.4 Random variables and distributions 

Suppose an experiment has an outcome sample space S. A real variable X that 
is defined for all possible outcomes in S (so that a real number - not necessarily 
unique - is assigned to each possible outcome) is called a random variable (RV). 
The outcome of the experiment may already be a real number and hence a random 
variable, e.g. the number of heads obtained in 10 throws of a coin, or the sum of 
the values if two dice are thrown. However, more arbitrary assignments are possi- 
ble, e.g. the assignment of a ‘quality’ rating to each successive item produced by a 
manufacturing process. Furthermore, assuming that a probability can be assigned 
to all possible outcomes in a sample space S, it is possible to assign a probability 
distribution to any random variable. Random variables may be divided into two 
classes, discrete and continuous, and we now examine each of these in turn. 

26.4.1 Discrete random variables 

A random variable X that takes only discrete values xi, xj, ...,x n , with proba- 
bilities pi, P 2 , . .., p„ , is called a discrete random variable. The number of values 
n for which X has a non-zero probability is finite or at most countably infinite. 
As mentioned above, an example of a discrete random variable is the number of 
heads obtained in 10 throws of a coin. If X is a discrete random variable, we can 
define a probability function (PF) /(x) that assigns probabilities to all the distinct 
values that X can take, such that 

{ Pi if X = Xi, 

' u (26.38) 

0 otherwise. 

A typical PF (see figure 26.6) thus consists of spikes, at valid values of X, whose 
height at x corresponds to the probability that X = x. Since the probabilities 
must sum to unity, we require 

n 

£/(*.) = i- (26.39) 

i=i 

We may also define the cumulative probability function (CPF) of X , F(x), whose 
value gives the probability that X < x, so that 

F(x) = Pr(X < x) = £ f( Xi ). (26.40) 

Xi<X 
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m 

2 P 


P - 

\ p - 

1 2 3 4 5 6 

(a) 


F(x) 



Figure 26.6 (a) A typical probability function for a discrete distribution, that 

for the biased die discussed earlier. Since the probabilities must sum to unity 
we require p = 2/13. (b) The cumulative probability function for the same 
discrete distribution. (Note that a different scale has been used for (b).) 


Hence F(x) is a step function that has upward jumps of p t at x = X;, i = 
1,2,..., n, and is constant between possible values of X. We may also calculate 
the probability that X lies between two limits, h and h (h < h)', this is given by 

Pr(/i<X</ 2 )= Y, f(xi) = F(l 2 )-F(h), (26.41) 

ll<Xi<l2 

i.e. it is the sum of all the probabilities for which x ; lies within the relevant interval. 


►A bag contains seven red balls and three white balls. Three balls are drawn at random 
and not replaced. Find the probability function for the number of red balls drawn. 


Let X be the number of red balls drawn. Then 

Pr(Y = 0) = /( 0) = A x ? x 

Pr(W = 1) =/(l) = A x | x 
Pr(AL = 2) = /(2) = A x ^ x 
Pr(Y = 3) = /( 3) = ^ x 6 - x 
It should be noted that 5I/=o fd) = 1, as expected. ◄ 


1 _ 1 
8 ~ 120 ’ 

7 7 

8 40’ 

6 21 

8 X3= 40’ 


5 

8 


7 

24' 


26.4.2 Continuous random variables 

A random variable X is said to have a continuous distribution if X is defined for a 
continuous range of values between given limits (often — oo to oo). An example of 
a continuous random variable is the height of a person drawn from a population, 
which can take any value (within limits!). We can define the probability density 
function (PDF) f(x) of a continuous random variable X such that 

Pr(x < X < x + dx) = / (x) dx. 
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fix) 



Figure 26.7 The probability density function for a continuous random vari- 
able X that can take values only between the limits and l 2 . The shaded area 
under the curve gives Pr {a < X < b ), whereas the total area under the curve, 
between the limits k and h, is equal to unity. 


i.e. f(x)dx is the probability that X lies in the interval x < X < x + dx. Clearly 
f{x) must be a real function that is everywhere > 0. If X can only take values 
between the limits f and U then in order for the sum of the probabilities of all 
possible outcomes to be equal to unity, we require 


I f(x)dx= 1. 

A 


Often X can take any value between — oo and oo and so 

/ OO 

fix)dx = 1. 

-OO 

The probability that X lies in the interval a < X < b is then given by 


Pr(a < X < b) = / f{x)dx, 


(26.42) 


i.e. Pr(a < X < b) is equal to the area under the curve of f(x) between these 
limits (see figure 26.7). 

We may also define the cumulative probability function F(x) for a continuous 
random variable by 


F(x) = Pr(X < x) = [ f(u)du , (26.43) 

•Ih 


where u is a (dummy) integration variable. We can then write 

Pr(« < X < b) = F(b) — F(a). 

From (26.43) it is clear that f{x) = dF(x)/dx. 
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►A random variable X has a PDF f (x) given by Ae~ x in the interval 0 < x < oo and zero 
elsewhere. Find the value of the constant A and hence calculate the probability that X lies 
in the interval 1 < X < 2. 


We require the integral of f(x) between 0 and oo to equal unity. Evaluating this integral, 
we find 

Ae~ x dx = [- Ae ~ x ]” = A, 
and hence A = 1. From (26.42), we then obtain 



Pr( 1 < X < 2) 


^ f(x)dx = 


' dx = —e 2 — (— e 1 ) = 0.23. ◄ 


It is worth mentioning here that a discrete RV can in fact be treated as 
continuous and assigned a corresponding probability density function. If X is a 
discrete RV that takes only the values x\, X 2 , ■ ■■, x n with probabilities puPi,---, Pn 
then we may describe X as a continuous RV with PDF 


fix) = ^TpiS(x-Xi), 


(26.44) 


;=i 


where <5(x) is the Dirac delta function discussed in subsection 13.1.3. From (26.42) 
and the fundamental property of the delta function (13.12), we see that 

Pr(u < X < b) — f f(x) dx, 

J a 
n 

= T Pi / <5 (x - Xj )dx = ^2 Pi’ 

i= 1 Ja i 

where the final sum extends over those values of i for which a < x; < b. 


26.4.3 Sets of random variables 

It is common in practice to consider two or more random variables simultane- 
ously. For example, one might be interested in both the height and weight of 
a person drawn at random from a population. In the general case, these vari- 
ables may depend on one another and are described by joint probability density 
functions. These are discussed fully in section 26.11, and we simply note here 
that, if we have (say) two random variables X and Y , then by analogy with the 
single-variable case we define their joint probability density function f(x,y) in 
such a way that, if X and Y are discrete RVs, 

Pr(X = x u Y = yj) =f(xi,yj), 
or, if X and Y are continuous RVs, 

Pr(x < X < x + dx, y < Y < y + dy) = / (x, y) dx dy. 
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In many circumstances, however, random variables do not depend on one 
another, i.e. they are independent. As an example, for a person drawn at random 
from a population, we might expect height and IQ to be independent random 
variables. Let us suppose that X and Y are two random variables with probability 
density functions g(x) and h(y) respectively. In mathematical terms, X and Y are 
independent RVs if their joint probability density function is given by f(x,y ) = 
g(.x)/j(y). Thus, for independent RVs, if X and Y are both discrete then 

Pr(X = x h Y = yj) = g(xi)h(yj) 

or, if X and Y are both continuous, then 

Pr(x < X < x + dx, y < Y < y + dy) = g (x)h(y) dx dy. 

The important point in each case is that the RHS is simply the product of the 
individual probability density functions (compare with the expression for Pr(AuR) 
in (26.22) for statistically independent events A and B). By a simple extension, 
one may also consider the case where one of the random variables is discrete and 
the other continuous. The above discussion may also be trivially extended to any 
number of independent RVs X t , i = 1,2 


>-The independent random variables X and Y have the PDFs g(x) = e~ x and h(y) = 2e~ ly 
respectively. Calculate the probability that X lies in the interval 1 < X < 2 and Y lies in 
the interval 0 < Y <1. 


Since X and Y are independent RVs, the required probability is given by 


Pr(l<X<2, 0<Y<1) = J g(x)dx I h(y)dy 

r 2 


to 


= J e x dx J 2e 2y dy 
= [~e~ x ]\ x [~e~ 2y ]l = 0.23 x 0.86 = 0.20. 


◄ 


26.5 Properties of distributions 

For a single random variable X, the probability density function f(x) contains 
all possible information about how the variable is distributed. However, for the 
purposes of comparison, it is conventional and useful to characterise f(x) by 
certain of its properties. Most of these standard properties are defined in terms 
of averages or expectation values. In the most general case, the expectation value 
E[g[X)\ of any function g(X) of the random variable X is defined as 

{ "y.i g(xi)f(Xi) for a discrete distribution, 

” (26.45) 

/ g(x)f(x)dx for a continuous distribution, 
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where the sum or integral is over all allowed values of X. It is assumed that 
the series is absolutely convergent or that the integral exists, as the case may be. 
From its definition it is straightforward to show that the expectation value has 
the following properties: 

(i) if a is a constant then E[a] = a; 

(ii) if a is a constant then E[ag(X)] = aE[g(X)]; 

(hi) ifg(X) = S (X) + f(X) then E[g(X)] = E[s(X)] + E[t(X)]. 

It should be noted that the expectation value is not a function of X but is 
instead a number that depends on the form of the probability density function 
f(x) and the function g(.x). Most of the standard quantities used to characterise 
/ (x) are simply the expectation values of various functions of the random variable 
X. We now consider these standard quantities. 


26.5.1 Mean 


The property most commonly used to characterise a probability distribution is 
its mean, which is defined simply as the expectation value E [X] of the variable X 
itself. Thus, the mean is given by 


E[X\ = 


Xif(Xi) 
f xf (x) dx 


for a discrete distribution, 
for a continuous distribution. 


(26.46) 


The alternative notations /( and (x) are also commonly used to denote the mean. 
If in (26.46) the series is not absolutely convergent, or the integral does not exist, 
we say that the distribution does not have a mean, but this is very rare in physical 
applications. 


► 77ie probability of finding a Is electron in a hydrogen atom in a given infinitesimal volume 
dV is xp’ipdV. where the quantum mechanical wavefunction xp is given by 

xp = Ae~ r/a °. 

Find the value of the real constant A and thereby deduce the mean distance of the electron 
from the origin. 


Let us consider the random variable R = ‘distance of the electron from the origin’. Since 
the Is orbital has no 9- or ^-dependence (it is spherically symmetric), we may consider 
the infinitesimal volume element dV as the spherical shell with inner radius r and outer 
radius r + dr. Thus, dV = 4n r 2 dr and the PDF of R is simply 

Pr(r < R < r + dr) = f(r) dr = 4nr 2 A 2 e~ 2r/ ‘‘° dr. 

The value of A is found by requiring the total probability (i.e. the probability that the 
electron is somewhere) to be unity. Since R must lie between zero and infinity, we require 
that 

/>00 

A 2 / e~ 2r/a °4nr 2 dr = 1. 

Jo 
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Integrating by parts we find A = l/(nal) 1/2 . Now, using the definition of the mean (26.46), 
we find 

/•*> 4 /•” 

E[R]= rf(r)dr=-r r 3 e- 2r/a » dr. 

Jo a 0 Jo 

The expression on the RHS may be integrated by parts and takes the value 3 «q/ 8; 
consequently we find that E[R] = 3a 0 /2. ◄ 


26.5.2 Mode and median 

Although the mean discussed in the last section is the most common measure 
of the ‘average’ of a distribution, two other measures, which do not rely on the 
concept of expectation values, are frequently encountered. 

The mode of a distribution is the value of the random variable X at which the 
probability (density) function f(x) has its greatest value. If there is more than one 
value of X for which this is true then each value may equally be called the mode 
of the distribution. 

The median M of a distribution is the value of the random variable X at which 
the cumulative probability function F(x) takes the value i.e. F(M) = i. Related 
to the median are the lower and upper quartiles Q\ and Q u of the PDF, which 
are defined such that 


F(Q l )=i F(Q u )=l 

Thus the median and lower and upper quartiles divide the PDF into four regions 
each containing one quarter of the probability. Smaller subdivisions are also 
possible, e.g. the nth percentile, P„, of a PDF is defined by F(P n ) = n/100. 


>-Find the mode of the PDF for the distance from the origin of the electron whose wave- 
function was given in the previous example. 


We found in the previous example that the PDF for the electron's distance from the origin 
was given by 


4r 2 

f(r) = -^-e~ 2r/ao . (26.47) 

«o 

Differentiating f(r) with respect to r, we obtain 

d f = *L(t-L) e -»/« 

dr ag \ a 0 ) 

Thus f(r) has turning points at r = 0 and r = a 0 , where df /dr = 0. It is straightforward 
to show that r = 0 is a minimum and r = ao is a maximum. Moreover, it is also clear that 
r = a 0 is a global maximum (as opposed to just a local one). Thus the mode of f(r ) occurs 
at r = ao- ◄ 
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26.5.3 Variance and standard deviation 
The variance of a distribution, V[X], also written a 2 , is defined by 

r I JZAxj — u) 2 f(xj) for a discrete distribution, 

V[X] =E [(X -/«)] = < 1 ’ 

I f (x — /i)~f(x)dx for a continuous distribution. 

(26.48) 

Here p has been written for the expectation value E[X] of X. As in the case of 
the mean, unless the series and the integral in (26.48) converge the distribution 
does not have a variance. From the definition (26.48) we may easily derive the 
following useful properties of V[X], If a and b are constants then 

(i) V[a] = 0, 

(ii) V[aX + b] = a 2 V[X]. 

The variance of a distribution is always positive; its positive square root is 
known as the standard deviation of the distribution and is often denoted by a. 
Roughly speaking, a measures the spread (about x = p) of the values that X can 
assume. 


►Find the standard deviation of the PDF for the distance from the origin of the electron 
whose wavefunction was discussed in the previous two examples. 


Inserting the expression (26.47) for the PDF f(r) into (26.48), the variance of the random 
variable R is given by 

I '*> 4^2 4 

V[R\ = / (r — p) 2 — r e~ 2r/, "° dr = -*■ / (r 4 — 2r 2 p + r 2 p 2 )e~ 2r/ ‘‘ 0 dr, 

Jo % % Jo 

where the mean p = E[R] = 3a 0 /2. Integrating each term in the integrand by parts we 
obtain 

3 a 2 

F[R] = 3 ag - 3 pa 0 + /r = 

Thus the standard deviation of the distribution is a = ^3n 0 /2. ◄ 

We may also use the definition (26.48) to derive the Bienayme-Chebyshev 
inequality, which provides a useful upper limit on the probability that random 
variable X takes values outside a given range centred on the mean. Let us consider 
the case of a continuous random variable, for which 

Pr( \X - n\ > c) = I f(x) dx, 

J \x— n\>c 

where the integral on the RHS extends over all values of x satisfying the inequality 
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|x — n\ > c. From (26.48), we find that 

ff 2 > f (x — /i) 2 f(x) dx > c 2 f f(x)dx. (26.49) 

J\x— [l\>C J\x— f-i\>C 

The first inequality holds because both (x — n ) 2 and /(.x) are non-negative for 
all x, and the second inequality holds because (x — n) 2 > c 2 over the range of 
integration. Flowever, the RFIS of (26.49) is simply equal to c 2 Pr(|X — /t| > c), 
and thus we obtain the required inequality 

Pr(|X-/<| >c)<^. 

c l 

A similar derivation may be carried through for the case of a discrete random 
variable. Thus, for any distribution /(x) that possesses a variance we have, for 
example, 

Pr(|X — /<| > 2cr ) < ^ and Pr(|X — n\ > 3cr) < 


26.5.4 Moments 

The mean (or expectation) of X is sometimes called the first moment of X, since 
it is defined as the sum or integral of the probability density function multiplied 
by the first power of x. By a simple extension the kth moment of a distribution 
is defined by 

/ (y for a discrete distribution, 

Hk = E[X k ] = jJ 

{ f x k f(x)dx for a continuous distribution. ^ 

For notational convenience, we have introduced the symbol /r to denote E[X k ], 
the kth moment of the distribution. Clearly, the mean of the distribution is then 
denoted by /q, often abbreviated simply to /r, as in the previous subsection, as 
this rarely causes confusion. 

A useful result that relates the second moment, the mean and the variance of 
a distribution is proved using the properties of the expectation operator: 

V[X] = E [(X - fi) 2 ] 

= E [X 2 - 2\iX + n 2 \ 

= E [X 2 ] -2 fiE[X] + ft 2 

= E [X 2 ] - 2/r + \i 2 

= E [X 2 ] - fj 2 . (26.51) 

In alternative notations, this result can be written 

((x - /r) 2 ) = (x 2 ) - (x) 2 or a 2 = fi 2 - p\. 
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►A biased die has probabilities p/2, p, p, p, p, 2 p of showing 1, 2, 3, 4, 5, 6 respectively. Find 
(i) the mean, (ii) the second moment and (Hi) the variance of this probability distribution. 


By demanding that the sum of the probabilities equals unity we require p = 2/13. Now, 
using the definition of the mean (26.46) for a discrete distribution, 

E[X] = xjf(xj) = 1 x fp + 2 x p + 3 x p + 4 x p + 5 x p + 6x2 p 
j 

_ 53 _ 53 2 _ 53 

~ T P ~ T X 13 ~ 13' 

Similarly, using the definition of the second moment (26.50), 

E[X 2 ] = Y, x 2 jf(xj ) = l 2 x y + 2 2 p + 3 2 p + 4 2 p + 5 2 p + 6 2 x 2 p 
j 

253 253 

~ ~Y P ~ TT 

Finally, using the definition of the variance (26.48), with p = 53/13, we obtain 

V[X] =Y( X J~ F) 2 f(xj) 
j 

= (1 - p) 2 \p + (2 - p) 2 p + (3 - p) 2 p + (4 - p) 2 p + (5 - p) 2 p + (6 - p) 2 2 p 
_ / 3120 \ _ 480 

“ Vl69 " ) P ~ 169' 

It is easy to verify that V[X] = E [3f 2 ] — (E[X]) 2 . ◄ 

In practice, to calculate the moments of a distribution it is often simpler to use 
the moment generating function discussed in subsection 26.7.2. This is particularly 
true for higher-order moments, where direct evaluation of the sum or integral in 
(26.50) can be somewhat laborious. 


26.5.5 Central moments 

The variance V[X] is sometimes called the second central moment of the distribu- 
tion, since it is defined as the sum or integral of the probability density function 
multiplied by the second power of x — /(. The origin of the term ‘central’ is that by 
subtracting p from x before squaring we are considering the moment about the 
mean of the distribution, rather than about x = 0. Thus the feth centred moment 
of a distribution is defined as 

for a discrete distribution, 
for a continuous distribution. ^5 52 ) 

Vk for the kth central moment. Thus 
V[X] = V 2 and we may write (26.51) as v 2 = P 2 ~ Clearly, the first central 


v„ = E [(X-„)‘] = j E 3v-rf/(x;) 

(X - pff(x) dx 

It is convenient to introduce the notation 
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moment of a distribution is always zero since, for example in the continuous case, 

V, = /(» - MW* = / */<*> i* - I m * = „-(„ x 1) = 0. 

We note that the notation fi k and v/t for the moments and central moments 
respectively is not universal. Indeed, in some books their meanings are reversed. 

We can write the kth central moment of a distribution in terms of its k th and 
lower-order moments by expanding ( X — p) k in powers of X. We have already 
noted that = Hi — Hv and similar expressions may be obtained for higher-order 
central moments. For example, 

v 3 = £[(X-/ (1 ) 3 ] 

= E [X 3 - 3/pX 2 + 3 \i\X - n{] 

= n 3 - 3mn2 + 3i4m - h 3 

= m — 3hiH2 + 2/r 3 . (26.53) 

In general, it is straightforward to show that 

Vli = Hk — k ClHk-lHl + • • • + ( — l) r k C r Hk—rH\ + ' ‘ ‘ + ( — 1 1 ( k Ck - 1 — 1)^1- 

(26.54) 

Once again, direct evaluation of the sum or integral in (26.52) can be rather 
tedious for higher moments, and it is usually quicker to use the moment generating 
function (see subsection 26.7.2), from which the central moments can be easily 
evaluated as well. 


► T/?e PDF for a Gaussian distribution (see subsection 26.9.1 ) with mean /( and variance 
a 2 is given by 


m = 


a^fln 


exp 


(x - H) 2 
2a 2 


Obtain an expression for the kth central moment of this distribution. 


As an illustration, we will perform this calculation by evaluating the integral in (26.52) 
directly. Thus, the kth central moment of / (x) is given by 


Vk 



(x — n) k f(x)dx 


(x ~ H ) 2 
2 g 2 


1 /*°° 

— tt- / (x - H) k exp 

G yj In J —00 

Js £ /exp (-&) “>■ 


dx 


(26.55) 


where in the last line we have made the substitution y = x — p. It is clear that if k is 
odd then the integrand is an odd function of y and hence the integral equals zero. Thus, 
Vk = 0 if k is odd. When k is even, we could calculate v k by integrating by parts to obtain 
a reduction formula, but it is more elegant to consider instead the standard integral (see 
subsection 6.4.2) 


r 2 

I = exp (-ay)dy 

J —00 


1/2 


991 




PROBABILITY 


and differentiate it repeatedly with respect to a (see section 5.12). Thus, we obtain 

dl 
da 


/ oo 

y 1 exp(— a y 2 )dy = 1 / 2 a ~ 3/2 

-00 


drl 

fa 2 = I y 4 exp(-ay 2 )dy = (i)(|)7t 1/2 a“ 


/ 


Jnr rx> 

— = (-1 rj y 2n exp(—ay 2 ) dy = (- 1 )"( >)(§). • • ( 1 ( 2 n - l))^ 1 / 2 ^™ 2 . 

Setting a = 1/(2 a 2 ) and substituting the above result into (26.55), we find (for k even) 
n =(§)(§)--- (l(k ~ l))(2<x 2 ) k/2 = (1)(3) • • • (k - l)a k . ◄ 


One may also characterise a probability distribution f(x) using the closely 
related normalised and dimensionless central moments 


7 k = 


n 

k/2 


n 

rr k ' 


From this set, and y 4 are more commonly called, respectively, the skewness 
and kurtosis of the distribution. The skewness 73 of a distribution is zero if it is 
symmetrical about its mean. If the distribution is skewed to values of x smaller 
than the mean then 73 < 0. Similarly 73 > 0 if the distribution is skewed to higher 
values of x. 

From the above example, we see that the kurtosis of the Gaussian distribution 
(subsection 26.9.1) is given by 

_ v 4 _ 3<t 4 
2 4 3’ 

v 2 C 

It is therefore common practice to define the excess kurtosis of a distribution 
as 74 — 3 . A positive value of the excess kurtosis implies a relatively narrower 
peak and wider wings than the Gaussian distribution with the same mean and 
variance. A negative excess kurtosis implies a wider peak and shorter wings. 

Finally, we note here that one can also describe a probability density function 
f(x) in terms of its cumulants, which are again related to the central moments. 
However, we defer the discussion of cumulants until subsection 26.7.4, since their 
definition is most easily understood in terms of generating functions. 


26.6 Functions of random variables 

Suppose X is some random variable for which the probability density function 
f(x) is known. In many cases, we may be more interested in the related random 
variable Y — Y (X), where Y (X) is some function of X. What is the probability 
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density function g(y) for the new random variable Y ? We now discuss how to 
obtain this function. 


26.6.1 Discrete random variables 


If X is a discrete RV that takes only the values x,-, i = 1,2 ,...,«, then Y must 
also be discrete and takes the values y,- = Y (x,), although some of these values 
may be identical. The probability function for Y is given by 


g(.v) = 


0 


if .v = yu 

otherwise, 


(26.56) 


where the sum extends over those values of j for which y,- = Y (xj). The simplest 
case arises when the function Y (X) possesses a single-valued inverse X(Y). In this 
case, only one x-value corresponds to each y-value, and we obtain a closed-form 
expression for g(y) given by 


g(v) = 


0 


if y = yu 
otherwise. 


If Y (X) does not possess a single-valued inverse then the situation is more 
complicated and it may not be possible to obtain a closed-form expression for 
g(y). Nevertheless, whatever the form of Y (2T), one can always use (26.56) to 
obtain the numerical values of the probability function g(y) at y = y,-. 


26.6.2 Continuous random variables 

If X is a continuous RV, then so too is the new random variable Y = Y (X). The 
probability that Y lies in the range y to y + dy is given by 


g(y)dy= [ f (x) dx, (26.57) 

JdS 

where dS corresponds to all values of x for which Y lies in the range y to y + dy. 
Once again the simplest case occurs when Y (2f) possesses a single-valued inverse 
X(Y). In this case, we may write 


g(y)dy = 


rx(y+dy ) 


'Ay) 


f(x')dx r 


from which we obtain 


rAy)+\%\dy 


I Ay) 


f(x') dx', 


g (y) = f(x(y)) 


dx 

dy 


(26.58) 
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Figure 26.8 The illumination of a coastline by the beam from a lighthouse. 


►A lighthouse is situated at a distance L from a straight coastline, opposite a point 0, and 
sends out a narrow continuous beam of light simultaneously in opposite directions. The beam 
rotates with constant angular velocity. If the random variable Y is the distance along the 
coastline, measured from O, of the spot that the light beam illuminates, find its probability 
density function. 


The situation is illustrated in figure 26.8. Since the light beam rotates at a constant angular 
velocity, 0 is distributed uniformly between —nil and 7t/2, and so f(8) = I/n. Now 
y = Ltand, which possesses the single-valued inverse 9 = tan -1 (y/L), provided that 9 lies 
between —n/2 and n/2. Since dy/dd = Lsec 2 9 = L( 1 + tan 2 9) = L[1 + (x/L) 2 ], from 
(26.58) we find 


g(y) = 


1 

71 


dd 

dy 


1 

ttL[1 + (y/L) 2 ] 


for —oo < y < oo. 


A distribution of this form is called a Cauchy distribution and is discussed in subsec- 
tion 26.9.5. ◄ 


If Y ( X ) does not possess a single-valued inverse then we encounter complica- 
tions, since there exist several intervals in the X-domain for which Y lies between 
y and y + dy. This is illustrated in figure 26.9, which shows a function Y (X) 
such that X(Y) is a double-valued function of Y . Thus the range y to y + dy 
corresponds to X’s being either in the range xi to x\ + dx\ or in the range X 2 to 
X 2 + dx 2 - In general, it may not be possible to obtain an expression for g(y) in 
closed form, although the distribution may always be obtained numerically using 
(26.57). However, a closed-form expression may be obtained in the case where 
there exist single-valued functions xi(y) and X 2 (y) giving the two values of x that 
correspond to any given value of y. In this case, 



rxity+dy) 


rx 2 (y+dy) 

g(y)dy = 

/ fix) dx 

+ 

/ fix) dx 


Jxi(y) 


Jxiiy) 


from which we obtain 


g(y) = f(xi(y)) 


dx i 
dy 


+ fixi{y)) 


dx 2 
dy 


(26.59) 
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Figure 26.9 Illustration of a function Y (X) such that its inverse X(Y) is a 
double-valued function of Y . The range y to y + dy corresponds to X being 
either in the range x\ to x\ + dx i or in the range X 2 to X 2 + dxj. 


This result may be generalised straightforwardly to the case where the range y to 
y + dy corresponds to more than two x-intervals. 


► 77?e random variable X is Gaussian distributed (see subsection 26.9.1 ) with mean /( and 
variance a 2 . Find the PDF of the new variable Y = (X — p) 2 /a 2 . 


It is clear that X(Y) is a double-valued function of Y. However, in this case, it is 
straightforward to obtain single-valued functions giving the two values of x that correspond 
to a given value of y; these are xi = p — a y Jy and X 2 = p + <J , Jy , where jy is taken to 
mean the positive square root. 

The PDF of X is given by 


f(x) = 




exp 


u--h ) 2 

2 a 2 


Since dx^/dy = — cr/(2 jy) and dx 2 /dy = a/(2 jy), from (26.59) we obtain 


g(v) = 




exp(-‘y) 


2 Vf 


GyJ 2n 


exp(-‘y) 


2 Vf 


= 1/2 exp(— f v). 

2 jn 


As we shall see in subsection 26.9.3, this is the gamma distribution j. ◄ 


26.6.3 Functions of several random variables 

We may extend our discussion further, to the case in which the new random 
variable is a function of several other random variables. For definiteness, let us 
consider the random variable Z = Z(X,Y), which is a function of two other 
RVs X and Y. Given that these variables are described by the joint probability 
density function f(x,y), we wish to find the probability density function p(z) of 
the variable Z. 
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If X and Y are both discrete RVs then 


P(z) = '52f(x i ,y j ), 
Uj 


( 26 . 60 ) 


where the sum extends over all values of i and j for which Z(x,,y 7 ) = 2 . Similarly, 
if X and Y are both continuous RVs then p(z) is found by requiring that 


p(z)dz = 


// f(x,y)dxdy, 
J JdS 


( 26 . 61 ) 


where dS is the infinitesimal area in the xy-plane lying between the curves 
Z(x,y ) = 2 and Z(x, y) — 2 + dz. 


► Suppose X and Y are independent continuous random variables in the range —00 to 00 , 
with PDFs g(x) and h(y) respectively. Obtain expressions for the PDFs of Z = X + Y and 
W=XY. 


Since X and Y are independent RVs, their joint PDF is simply f(x,y ) 
from (26.61), the PDF of the sum Z = X + Y is given by 


/ oo 

dx g(x) 

-00 

-a 


pz+dz—x 


dy h(y) 


g(x)h(z — x) dx ) dz. 


g(x)h(y). Thus, 


Thus p(z) is the convolution of the PDFs of g and h (i.e. p = g * h, see subsection 13.1.7). 
In a similar way, the PDF of the product W = XY is given by 


/ oo p(w+dw)/\x\ 

dx g(x) / dy h(y) 

-00 J w/|x| 


(/: 


dx 

g(x)h(w/x) — ) dw ◄ 


The prescription (26.61) is readily generalised to functions of n random variables 
Z = Z(Xi,X 2 ,.. ■ ,X n ), in which case the infinitesimal ‘volume’ element dS is the 
region in X 1 X 2 • • • x„-space between the (hyper Jsurfaces Z(xi,X 2 , ...,x„) = 2 and 
Z(xi,X 2 , ...,x„) = z+dz. In practice, however, the integral is difficult to evaluate, 
since one is faced with the complicated geometrical problem of determining the 
limits of integration. Fortunately, an alternative (and powerful) technique exists 
for evaluating integrals of this kind. One eliminates the geometrical problem by 
integrating over all values of the variables x,- without restriction, while shifting 
the constraint on the variables to the integrand. This is readily achieved by 
multiplying the integrand by a function that equals unity in the infinitesimal 
region dS and zero elsewhere. From the discussion of the Dirac delta function in 
subsection 13.1.3, we see that d(Z(xi,X 2 , ...,x„)— z)dz satisfies these requirements, 
and so in the most general case we have 

p( 2)= / / ••• / /(xi,X2 , ...,x„)< 5 (Z(xi,x 2 , ...,x„) — z)dxidx 2 ...dx n , 

J J j (26.62) 
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where the range of integration is over all possible values of the variables x,-. This 
integral is most readily evaluated by substituting in (26.62) the Fourier integral 
representation of the Dirac delta function discussed in subsection 13.1.4, namely 


<S(Z(xi,x 2 ,...,x„) —z) 


1 

2n 


e ik(Z(x u x 2 ,...Jc„)-z ) dk _ 


(26.63) 


This is best illustrated by considering a specific example. 


►T general one-dimensional random walk consists of n independent steps, each of which 
can be of a different length and in either direction along the x-axis. If g(x) is the PDF for 
the ( positive or negative ) displacement X along the x-axis achieved in a single step, obtain 
an expression for the PDF of the total displacement S after n steps. 


The total displacement S is simply the algebraic sum of the displacements X, achieved in 
each of the n steps, so that 

S = X Y + X 2 + • • ■ + X n . 


Since the random variables X, are independent and have the same PDF g(x), their joint 
PDF is simply g(xi)g(x 2 )- • -g(x„). Substituting this into (26.62), together with (26.63), we 
obtain 


P(s) = 


/ OO /*CO /»00 1 r oo 

/ •••/ g(xi)g(x 2 )---g(x„)— / e' kl(xi+X2+ " +Xn) - s] dkdx 1 dx 2 ---dx l 

-00 J — 00 J— 00 J— 00 

1 r 

= -x- / dke 

2 71 J — oo 

It is convenient to define the characteristic function C(k ) of the variable X as 


(/: 


g(x)e ,kx dx 


(26.64) 


/ OO 

g{x)e' kx dx, 

-00 

which is simply related to the Fourier transform of g(x). Then (26.64) may be written as 


P(s) = 


1 

2n 



e- to [C(fc)F dk. 


Thus p(s) can be found by evaluating two Fourier integrals. Characteristic functions will 
be discussed in more detail in subsection 26.7.3. ◄ 


26.6.4 Expectation values and variances 

In some cases, one is interested only in the expectation value or the variance 
of the new variable Z rather than in its full probability density function. For 
definiteness, let us consider the random variable Z = Z(X, Y ), which is a function 
of two RVs X and Y with a known joint distribution f(x,y); the results we will 
obtain are readily generalised to more (or fewer) variables. 

It is clear that E[Z] and V[Z] can be obtained, in principle, by first using the 
methods discussed above to obtain p(z) and then evaluating the appropriate sums 
or integrals. The intermediate step of calculating p(z) is not necessary, however, 
since it is straightforward to obtain expressions for E[Z] and V[Z] in terms of 
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the variables X and 7. For example, if X and Y are continuous RVs then the 
expectation value of Z is given by 


E[Z] = 


zp(z) dz = 


Z (x, y)f (x, y ) dx dy. 


(26.65) 


An analogous result exists for discrete random variables. 

Integrals of the form (26.65) are often difficult to evaluate. Nevertheless, we 
may use (26.65) to derive an important general result concerning expectation 
values. If X and Y are any two random variables and a and b are arbitrary 
constants then by letting Z = aX + bY we find 


E[aX + bY] = aE[X] + bE[Y], 


Furthermore, we may use this result to obtain an approximate expression for the 
expectation value E [ Z (X, Y )] of any arbitrary function of X and Y . Letting px = 
E[X] and py = E[Y], then, provided Z(X,Y) can be reasonably approximated 
by the linear terms of its Taylor expansion about the point (px,py), we have 


Z(X,Y)*Z(p x ,py) + 



(X — p x ) + 



(Y-py), 


(26.66) 


where the partial derivatives are evaluated at X = px and Y = p Y . Taking the 
expectation value of both sides, we find 

E[Z(X,Y)] » Z(p x ,p Y )+(^j (E[X]-p x )+(jf) (E[Y]-py) = Z(px,Py), 

which gives the approximate result E[Z(X , 7)] « Z(px,py )• 

By analogy with (26.65), the variance of Z = Z(X, Y ) is given by 


Y[Z]= (z - pzYp(z)dz = / / [Z(x,y)~ p z ]f(x,y)dxdy. 


(26.67) 


where pz = E[Z], We may use this expression to derive a useful general result. If 
X and 7 are two independent random variables, so that f(x,y) = g(x)h(y), and 
a, b and c are constants then by setting Z = aX + bY + c in (26.67) we obtain 

V[aX + bY + c] = a 2 V[X ] + b 2 V[Y], (26.68) 


From (26.68) we also obtain the important special case 


V[X + 7] = V[X - 7] = V[X] + V[Y]. 


Provided X and 7 are indeed independent random variables, we may obtain 
an approximate expression for V[Z(X, 7)], for any arbitrary function Z(X, Y ), 
in a similar manner to that used in approximating E [ Z (X, 7 )] above. Taking the 
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variance of both sides of (26.66), and using (26.68), we find 

K[z(x,r)]*0|) v[x] + (fy) v[n (26 ’ 69) 

the partial derivatives being evaluated at X = /r x and Y = hy- 


26.7 Generating functions 

As we saw in chapter 16, when dealing with particular sets of functions /„, 
each member of the set being characterised by a different non-negative integer 
n, it is sometimes possible to summarise the whole set by a single function of a 
dummy variable (say f), called a generating function. The relationship between 
the generating function and the nth member /„ of the set is that if the generating 
function is expanded as a power series in t then /„ is the coefficient of t". For 
example, in the expansion of the generating function G(z,t) = (1 — 2zt + f 2 ) -1 / 2 , 
the coefficient of t" is the nth Legendre polynomial P„(z), i.e. 

00 

G(z, t) = (1 - 2 zt + t 2 )- 1 ' 2 = J2 p n(z)t n - 

n = 0 

We found that many useful properties of, and relationships between, the members 
of a set of functions could be established using the generating function and other 
functions obtained from it, e.g. its derivatives. 

Similar ideas can be used in the area of probability theory, and two types of 
generating function can be usefully defined, one more generally applicable than 
the other. The more restricted of the two, applicable only to discrete integral 
distributions, is called a probability generating function; this is discussed in the 
next section. The second type, a moment generating function, can be used with 
both discrete and continuous distributions and is considered in subsection 26.7.2. 
From the moment generating function, we may also construct the closely re- 
lated characteristic and cumulant generating functions; these are discussed in 
subsections 26.7.3 and 26.7.4 respectively. 


26.7.1 Probability generating functions 

As already indicated, probability generating functions are restricted in applicabil- 
ity to integer distributions, of which the most common (the binomial, the Poisson 
and the geometric) are considered in this and later subsections. In such distribu- 
tions a random variable may take only non-negative integer values. The actual 
possible values may be finite or infinite in number, but, for formal purposes, 
all integers, 0,1,2,... are considered possible. If only a finite number of integer 
values can occur in any particular case then those that cannot occur are included 
but are assigned zero probability. 
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If, as previously, the probability that the random variable X takes the value x„ 
is f(x n ), then 

yf(x n ) = l. 

n 

In the present case, however, only non-negative integer values of x n are possible, 
and we can, without ambiguity, write the probability that X takes the value n as 

fn, with 

00 

yn, = >■ ( 26 - 7 °) 

n= 0 

We may now define the probability generating function by 

00 

(D x (t) = Y,f" f - (26.71) 

n= 0 

It is immediately apparent that <t Y (t) = E[t x ] and that, by virtue of (26.70), 

4>xU) = 1. 

Probably the simplest example of a probability generating function (PGF) is 
provided by the random variable X defined by 

{ 1 if the outcome of a single trial is a ‘success’, 

0 if the trial ends in ‘failure’. 

If the probability of success is p and that of failure q ( = 1 — p) then 

(Px(f) = qt° + pf 1 + 0 + 0 + • • • = q + pt. (26.72) 

This type of random variable is discussed much more fully in subsection 26.8.1. 
In a similar but slightly more complicated way, a Poisson-distributed integer 
variable with mean 1 (see subsection 26.8.4) has a PGF 

00 P~ l i 1 " 

0 Y (f) = y — — f = e“V J . (26.73) 

' n\ 

n= 0 

We note that, as required, <l>x(l) = 1 in both cases. 

Useful results will be obtained from this kind of approach only if the summation 
(26.71) can be carried out explicitly in particular cases and the functions derived 
from <Py( 0 can be shown to be related to meaningful parameters. Two such 
relationships can be obtained by differentiating (26.71) with respect to f. Taking 
the first derivative we find 

, - , s 00 00 

( l => 3>x(l ) = ynfn=E[X], (26.74) 

n= 0 w=0 


1000 



26.7 GENERATING FUNCTIONS 


and differentiating once more we obtain 

__|U = £ _ l)/„t"- 2 => 4>" (1) = £ n(n - 1 )/„ = £[X(X - 1)]. 

n= o n=o (26.75) 

Equation (26.74) shows that <t>' Y ( 1 ) gives the mean of X. Using both (26.75) and 
(26.51) allows us to write 

<D" (1) + tfr(l) - [4 >; y (1)] 2 = E[X(X — 1)] + E[X] — (£[X]) 2 

= £ [X 2 ] - £[X] + £[X] - (£ [X] ) 2 
= £[X 2 ]-(£[X]) 2 

= fffl, (26.76) 

and so express the variance of X in terms of the derivatives of its probability 
generating function. 


►A random variable X is given by the number of trials needed to obtain a first success 
when the chance of success at each trial is constant and equal to p. Find the probability 
generating function for X and use it to determine the mean and variance of X. 


Clearly, at least one trial is needed, and so /o = 0. If n (> 1) trials are needed for the first 
success, the first n — 1 trials must have resulted in failure. Thus 

Pr(X = n ) = q n ~'p, n > 1, (26.77) 

where q = 1 — p is the probability of failure in each individual trial. 

The corresponding probability generating function is thus 

00 00 

<Mt) = 

n = 0 n= 1 


Q . q 1 — qt 1 — 


qt 


(26.78) 


where we have used the result for the sum of a geometric series, given in chapter 4, to 
obtain a closed-form expression for 4> Y (f). Again, as must be the case, 0. Y ( 1 ) = 1. 

To find the mean and variance of X we need to evaluate ®' Y (1) and ® Y (1). Differentiating 
(26.78) gives 


4>; Y (f) = 


O" (f) = 


(i -qt ) 2 

ipq 

(i -qt ) 3 


*-,!>_ Aa _7a. 


Thus, using (26.74) and (26.76), 

E[X] = 04.(1) = i 

V[X] = <h Y (l) + ® Y (1) - [©ill)] 2 
2q 1 1 q 

— p2 p p2 ~ p2 ‘ 


A distribution with probabilities of the general form (26.77) is known as a geometric 
distributionand is discussed in subsection 26.8.2. This form of distribution is common in 
‘waiting time" problems (subsection 26.9.3). ◄ 
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Figure 26.10 The pairs of values of n and r used in the evaluation of ®Y+y (f). 

Sums of random variables 

We now turn to considering the sum of two or more independent random 
variables, say X and Y , and denote by S 2 the random variable 


S 2 =X+Y. 

If 4> s ,(f) is the PGF for S 2 , the coefficient of t n in its expansion is given by the 
probability that X + Y = n and is thus equal to the sum of the probabilities that 
X — r and Y = n — r for all values of r in 0 < r < n. Since such outcomes for 
different values of r are mutually exclusive, we have 

00 

Pr(X + Y = n) = Y p r(W = r) Pr ( Y = n - r). (26.79) 

r= 0 

Multiplying both sides of (26.79) by t n and summing over all values of n enables 
us to express this relationship in terms of probability generating functions as 
follows : 

oo oo n 

<&x+y (0 = Y Pr ( x + Y = n)t" =YY1 Pr(Z = r ) f Pr( Y = n - r '> t "~ r 

n= 0 n= 0 r= 0 

00 00 

= y Y Pr(z = Pr( Y = n ~ • 

r= 0 n=r 

The change in summation order is justified by reference to figure 26.10, which 
illustrates that the summations are over exactly the same pairs of values of n and 
r, but with the first (inner) summation over the points in a column rather than 
over the points in a row. Now, setting n = r + s gives the final result, 

00 00 

$x+y (t) = ^ Pr(X = r)f ^ Pr(T = s)f 

r=0 s=0 

= O x (0®y(f), (26.80) 
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i.e. the PGF of the sum of two independent random variables is equal to the 
product of their individual PGFs. The same result can be deduced in a less formal 
way by noting that if X and Y are independent then 

E [t x+Y ] = E [t x ] E [t Y ] . 

Clearly result (26.80) can be extended to more than two random variables by 
writing S 3 = S 2 + Z etc., to give 

n 

< 26 - 81 ) 

i=i 

and, further, if all the X,- have the same probability distribution, 

®(£ Ux^it) = [<*>x(t)] n . (26.82) 

This latter result has immediate application in the deduction of the PGF for the 
binomial distribution from that for a single trial, equation (26.72). 


Variable-length sums of random variables 

As a final result in the theory of probability generating functions we show how to 
calculate the PGF for a sum of N random variables, all with the same probability 
distribution, when the value of N is itself a random variable but one with a 
known probability distribution. In symbols, we wish to find the distribution 
of 


Sn — X\ + X 2 + ■ ■ ■ + Xn, (26.83) 

where N is a random variable with Pr(IV = n) = li„ and PGF %jv(f) = 

E h n t n - 

The probability f that Sn = k is given by a sum of conditional probabilities, 
namelyf 


00 

f = 'y ] Pr (N = n) Pr(X 0 + X 3 + X 2 + • • • + X n = k) 

n = 0 

oo 

= ^ h n x coefficient of t k in [Ox(f)]"- 

n = 0 


Multiplying both sides of this equation by t k and summing over all k, we obtain 


( Formally Aq = 0 has to be included, since Pr (N = 0) may be non-zero. 
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an expression for the PGF Ss(f) of Sjv : 

00 00 00 

S s(t) = £kt k = t k ^ h n x coefficient of t k in [<h Y (t)]" 

fc =0 / c =0 w =0 

00 00 

= /! » n t k x coefficient of t k in [<h Y (t)]" 

n= 0 /c=0 

00 

= 5>,[<*>x(0]' ! 

n = 0 

= (26.84) 

In words, the PGF of the sum Sn is given by the compound function xivi^xfO) 
obtained by substituting <£>x(t) for t in the PGF for the number of terms N in 
the sum. We illustrate this with the following example. 


► T/ie probability distribution for the number of eggs in a clutch is Poisson distributed with 
mean X, and the probability that each egg will hatch is p (and is independent of the size of 
the clutch ). Use the results stated in ( 26.12 ) and ( 26.73 ) to show that the PGF ( and hence 
the probability distribution ) for the number of chicks that hatch corresponds to a Poisson 
distribution having mean Xp. 


The number of chicks that hatch is given by a sum of the form (26.83) in which X t = 1 if 
the ;th chick hatches and X, = 0 if it does not. As given by (26.72), <D Y f ) is thus ( 1— p)+pt. 
The value of N is given by a Poisson distribution with mean X; thus, from (26.73), in the 
terminology of our previous discussion, 

Xv(f) = e~V f . 

We now substitute these forms into (26.84) to obtain 
E s (f) = exp(-2)exp[2®. Y (t)] 

= exp(-2)exp{2[( 1 - p) + pt ]} 

= exp(— Xp) exp(Xpt). 

But this is exactly the PGF of a Poisson distribution with mean Xp. 

That this implies that the probability is Poisson distributed is intuitively obvious since, 
in the expansion of the PGF as a power series in f, every coefficient will be precisely 
that implied by such a distribution. A solution of the same problem by direct calculation 
appears in the answer to exercise 26.29. ◄ 


26.7.2 Moment generating functions 


As we saw in section 26.5 a probability function is often expressed in terms of 
its moments. This leads naturally to the second type of generating function, a 
moment generating function. For a random variable X, and a real number t, the 
moment generating function (MGF) is defined by 


M x (t) = E [e tX 


ffi e ' x, f (xj) for a discrete distribution, 

/ e tx f(x)dx for a continuous distribution. 
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The MGF will exist for all values of f provided that X is bounded and always 
exists at the point t = 0 where M(0) = £(1) = 1. 

It will be apparent that the PGF and the MGF for a random variable X 
are closely related. The former is the expectation of t x whilst the latter is the 
expectation of e tX : 

®*(t) = E [r Y ] , M x (t ) = E [ e tX ] . 


The MGF can thus be obtained from the PGF by replacing f by e r , and vice 
versa. The MGF has more general applicability, however, since it can be used 
with both continuous and discrete distributions whilst the PGF is restricted to 
non-negative integer distributions. 

As its name suggests, the MGF is particularly useful for obtaining the moments 
of a distribution, as is easily seen by noting that 


E \e tX ] = E 


1 +tX- 


t z X z 

~2f 


= 1 + E[X]t + E [X 2 ] 



Assuming that the MGF exists for all t around the point t = 0, we can deduce 
that the moments of a distribution are given in terms of its MGF by 


E[X"] 


d"M x (t ) 


dt' 1 


t = o 


(26.86) 


Similarly, by substitution in (26.51), the variance of the distribution is given by 

V[X] = My(0) — [M'y(0)] 2 , (26.87) 


where the prime denotes differentiation with respect to t. 


► 77?e MGF for the Gaussian distribution (see the end of subsection 26.9.1 ) is given by 

M x (t) = exp ( lit + |cr 2 t 2 ) . 

Find the expectation and variance of this distribution. 

Using (26.86), 

M'x(t) = (/( + er 2 f) exp (/it + ^erf 2 ) => E[X] = M' x (0) = p, 

M x (t) = [a 2 + (p + a 2 t) 2 ] exp (/U + \<r 2 t 2 ) => M x ( 0) = a 2 + p 2 . 

Thus, using (26.87), 

V[X] = a 2 + p 2 — p 2 = er. 

That the mean is found to be p and the variance er 2 justifies the use of these symbols in 
the Gaussian distribution. ◄ 

The moment generating function has several useful properties that follow from 
its definition and can be employed in simplifying calculations. 
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Scaling and shifting 

If Y = aX + b, where a and b are arbitrary constants, then 

My(f) = E [e tY ] = E [e« aX+b l] = e h, E [ e atX } = e bt M x (at). (26.88) 

This result is often useful for obtaining the central moments of a distribution. If the 
MFG of X is M x (t) then the variable Y = X— f.t has the MGF M Y (t) = e~ llt M x (t), 
which clearly generates the central moments of X, i.e. 

E[(X - nf] = E[Y n ] = My\0) = ^ . 


Sums of random variables 

If Xi,X 2 ,...,Xtf are independent random variables and S x = X\ +X 2 H \~X x 

then 


M s Jt) = E [c rSw ] = E [ e dx i +x 2 +-+x N y 


= E 


r jv 


1 1 


Since the X t - are independent, 


N N 

M SH (t ) = n e [ etx> ] = n ( 26 - 89 > 

i=l i=l 

In words, the MGF of the sum of N independent random variables is the product 
of their individual MGFs. By combining (26.89) with (26.88), we obtain the more 
general result that the MGF of S x = c\X\ + c 2 X 2 + ■ ■ ■ + c N X N (where the c,- are 
constants) is given by 


N 

MsJt) = I] M Xl (cd)- (26.90) 

i=l 

Variable-length sums of random variables 

Let us consider the sum of N independent random variables X\ (i = 1,2,..., N), all 
with the same probability distribution, and let us suppose that N is itself a random 
variable with a known distribution. Following the notation of section 26.7.1, 


Sjv — X\ + X 2 + ■ • • + X x , 


where N is a random variable with Pr(iV = n) = h n and probability generating 
function xi v(0 = F° r definiteness, let us assume that the X t are continuous 

RVs (an analogous discussion can be given in the discrete case). Thus, the 
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probability that value of S n lies in the interval s to s + ds is given by| 

00 

Pr(s < Sn < s + ds) = y Pr(iV = n) Pr(s < Xq + X\ + X 2 ■ ■ ■ + X n < s + ds). 


n = 0 


Let us denote Pr(s < Sn < s + ds) by /at(s) ds and Pr(s < Xq + X\ + X 2 ■ ■ ■ + X„ < 
s + ds) by f„(s)ds. Thus, the kth moment of the PDF /jv(s) is given by 

/ p 00 

s k f 1 v(s) ds = / s k ^2 P f (N = n)f„(s ) ds 

n=o 

00 p 

= ^Pr(lV = n) / s k f„(s)ds 

n = 0 ^ 

00 

= y; h„ x {k\x coelhcient of t k in [M x (t)] n ) 

n = 0 

Thus the MGF of Sn is given by 
k\ 

= y h„[M x (t)\ 


,,/C 

M s Jt) = En f " = E /! "E f " x coefficient of t k in [M x (t)] n 


k = 0 


«=0 k = 0 


n = 0 

= XN(M x (t)). 


In words, the MGF of the sum Sn is given by the compound function ~/j v(Mj(t)) 
obtained by substituting Mx(t) for t in the PGF for the number of terms N in 
the sum. 


Uniqueness 

If the MGF of the random variable X\ is identical to that for X 2 then the 
probability distributions of Xi and X 2 are identical. This is intuitively reasonable 
although a rigorous proof is complicated,! and beyond the scope of this book. 


26. 7.3 Characteristic function 

The characteristic function (CF) of a random variable X is defined as 

e ,lXi f(xj) for a discrete distribution, 
f e ,tx f(x)dx for a continuous distribution ( "> 6 91) 

so that Cx(t) = Mx(it), where Mx(t) is the MGF of X. Clearly, the characteristic 


C x (t) = E \e itx ] = 


f As in the previous section, Xg has to be formally included, since Pr (N = 0) may be non-zero, 
f See, for example, Moran, An Introduction to Probability Theory (Oxford Science Publications). 
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function and the MGF are very closely related and can be used interchangeably. 
Because of the formal similarity between the definitions of Cx(t) and M x (t), the 
characteristic function possesses analogous properties to those listed in the previ- 
ous section for the MGF, with only minor modifications. Indeed, by substituting it 
for t in any of the relations obeyed by the MGF and noting that Cx(t) = Mx{it), 
we obtain the corresponding relationship for the characteristic function. Thus, for 
example, the moments of X are given in terms of the derivatives of C x (t) by 

E[X n ] = (-i) a cP(0). 

Similarly, if Y = aX + b then Cy(t) = e' ht C x (t). 

Whether to describe a random variable by its characteristic function or by its 
MGF is partly a matter of personal preference. However, the use of the CF does 
have some advantages. Most importantly, the replacement of the exponential e tX 
in the definition of the MGF by the complex oscillatory function e ItX in the CF 
means that in the latter we avoid any difficulties associated with convergence of 
the relevant sum or integral. Furthermore, when X is a continous RV, we see 
from (26.91) that Cx(t) is related to the Fourier transform of the PDF f(x). As 
a consequence of Fourier’s inversion theorem, we may obtain f(x) from C x (t) by 
performing the inverse transform 

1 f m 

fix) = — J C x (t)e 1 x dt. 


26.7.4 Cumulant generating function 

As mentioned at the end of subsection 26.5.5, we may also describe a probability 
density function f(x) in terms of its cumulants. These quantities may be expressed 
in terms of the moments of the distribution and are important in sampling theory, 
which we discuss in the next chapter. The cumulants of a distribution are best 
defined in terms of its cumulant generating function (CGF), given by Kx(t) = 
lnMx(f) where Mx(t) is the MGF of the distribution. If Kx(t) is expanded as a 
power series in t then the fcth cumulant Kk of f(x) is the coefficient of t k /k\: 

2 3 

Kx(t) = lnMx(f) = Kit + >C 2 — + K 3 — T . (26.92) 

Since Mj( 0) = 1, Kx(t) contains no constant term. 


►Find all the cumulants of the Gaussian distribution discussed in the previous example. 


The moment generating function for the Gaussian distribution is M x (t ) = exp (pt + ^o 2 t 2 ). 
Thus, the cumulant generating function has the simple form 

K x (t) = \nM x (t ) = pt + f<7 2 f 2 . 

Comparing this expression with (26.92), we find that k 1 = p , x '2 = ff 2 and all other 
cumulants are equal to zero. ◄ 
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We may obtain expressions for the cumulants of a distribution in terms of its 
moments by dilferentiating (26.92) with respect to t to give 

dKx 1 dMx 

dt Mx dt 

Expanding each term as power series in t and cross-multiplying, we obtain 

^K1 + K 2 f + K3 — + ■ • ■ ^ ^1 + Hit + /i 2 — + • • • ^ + t* 2 t + + • • • ^ , 

and, on equating coefficients of like powers of t on each side, we find 

/h = K\, 

H 2 = K2 + KiHu 

/(3 = X‘3 + 2k 2 /<1 + Ki/t 2 , 

/I4 = K‘4 + 3 K 3 hl + 3 /< 2 /l 2 + X1/I3, 


Hk — Kk + k l ClKk-lHl + • ' ' + k 1 C T Kk-rfi r + ‘ ’ ’ + KlhA-1- 
Solving these equations for the Kk, we obtain (for the first four cumulants) 

Kl = Hu 

k 2 = H 2 - $ = v 2 , 

K-i = h3 — 3/< 2 /q + 2/4 = v 3 , 

K 4 = /4 — 4/<3/ii + \2h2h\ ~ 3/i 2 — 6/4 = V 4 — 3v 2 . (26.93) 

Higher-order cumulants may be calculated in the same way but become increas- 
ingly lengthy to write out in full. 

The principal property of cumulants is their additivity, which may be proved 
by combining (26.92) with (26.90). If X\, X 2 , ... ,Xx are independent random 
variables and Kx,( t) for i — 1,2 is the CGF for X t then the CGF of 
Sn = ciJSfi + C 2 X 2 + • • ■ + cnXn (where the c,- are constants) is given by 

N 

K SN (t) = K x,(cjt). 
i= 1 

Cumulants also have the useful property that, under a change of origin X — » 
X + a the first cumulant undergoes the change xq — »• K\ + a but all higher-order 
cumulants remain unchanged. Under a change of scale X — > bX, cumulant K r 
undergoes the change K r — > b r K r . 

26.8 Important discrete distributions 

Having discussed the some general properties of distributions, we now consider 
the more important discrete distributions encountered in physical applications. 
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Distribution 

Probability law f(x) 

MGF 

El X] 

V[X] 

binomial 

n r' n x n n—x 
'- / xP H 

{pe f + q) n 

np 

npq 

negative binomial 

r+x ~ 1 C x p r q x 

G-V) 

rq 

rq 

T 



P 

P 2 

geometric 

X — 1 

q l p 

pe‘ 

1 — qe‘ 

1 

q_ 



P 

P 2 

hypergeometric 

(Np)\(Nq) \n \(N—n ) ! 


np 

N — n 

x\(Np—x) \(n—x)\(N q—n+x) \N ! 


N - 1 nM 

Poisson 

) x 

v* 


A 

A 


x! 





Table 26.1 Some important discrete probability distributions. 


These are discussed in detail below, and summarised for convenience in table 26.1 ; 
we refer the reader to the relevant section below for an explanation of the symbols 
used. 


26.8.1 The binomial distribution 

Perhaps the most important discrete probability distribution is the binomial dis- 
tribution. This distribution describes processes that consist of a number of inde- 
pendent identical trials with two possible outcomes, A and B = A. We may call 
these outcomes ‘success’ and ‘failure’ respectively. If the probability of a success 
is Pr(A) = p, then the probability of a failure is Pr(B) = q — 1 — p. If we perform 
n trials then the discrete random variable 

X = number of times A occurs 

can take the values 0, 1,2, its distribution amongst these values is described 
by the binomial distribution. 

We now calculate the probability that in n trials we obtain x successes (and so 
n — x failures). One way of obtaining such a result is to have x successes followed 
by n — x failures. Since the trials are assumed independent, the probability of this is 

pp- ■ p X qq- ■ ■ q = p x q n ~ x . 
x times n — x times 

This is, however, just one permutation of x successes and n — x failures. The total 
number of permutations of n objects, of which x are identical and of type 1 and 
n — x are identical and of type 2, is given by (26.33) as 


x\(n — x) ! 
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f(x) m 



Figure 26.11 Some typical binomial distributions with various combinations 
of parameters n and p. 


Therefore, the total probability of obtaining x successes from n trials is 

f(x) = Pr(X = x) = n C x p x q"- x = n C x p x (l - p) n - x , (26.94) 

which is the binomial probability distribution formula. When a random variable 
X follows the binomial distribution for n trials, with a probability of success p, 
we write X ~ Bin(n,p). Then the random variable X is often referred to as a 
binomial variate. Some typical binomial distributions are shown in figure 26.11. 

► //" a single six-sided die is rolled five times, what is the probability that a six is thrown 
exactly three times? 

Here the number of ‘trials’ n = 5, and we are interested in the random variable 

X = number of sixes thrown. 

Since the probability of a ‘success’ is p = L the probability of obtaining exactly three sixes 
in five throws is given by (26.94) as 

3175 ^ 37 ! ( s )' ( s )' 5 , -°“ 2 - 

In evaluating binomial probabilities a useful result is the binomial recurrence 
formula 

Pr(X = x + 1) = - ( — — y ) Pr(X = x), (26.95) 

q \x T 1/ 
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which enables successive probabilities Pr(X = x + k), k — 1,2,..., to be calculated 
once Pr(X = x) is known; it is often quicker to use than (26.94). 


► Tfie random variable X is distributed as X ~ Bin(3, |). Evaluate the probability function 
f(x) using the binomial recurrence formula. 


The probability Pr(X 


The ratio p/q = 
(26.95), we find 


0) may be calculated using (26.94) and is 
Pr(2f = 0) = 3 CoG)°(i) 3 = i. 

= 1 in this case and so, using the binomial recurrence formula 


Pr(Y 


1) = 1 x 


3-0 

oTT 


3 

8 ’ 


Pr(Y 


2) = 1 x 


1 + 1 


3 

8 


3 

8 ’ 


Pr(Y 


3) = 1 x 


3-2 

2+1 


3 

8 


1 

8 ’ 


results which may be verified by direct application of (26.94). ◄ 


We note that, as required, the binomial distribution satifies 


£ fix) = £ "C x p x q"~ x = (p + q) n - 1. 

X=0 x=0 

Furthermore, from the definitions of E[X] and V[X] for a discrete distribution, 
we may show that for the binomial distribution E[X] — np and V[X] = npq. The 
direct summations involved are, however, rather cumbersome and these results 
are obtained much more simply using the moment generating function. 


The moment generating function for the binomial distribution 


To find the MGF for the binomial distribution we consider the binomial random 
variable X to be the sum of the random variables X t , i = l,2,...,n, which are 
defined by 


X,= 


if a ‘success’ occurs on the z'th trial, 
if a ‘failure’ occurs on the z'th trial. 


Thus 


Mj(t) = E [e tX ‘] = e 0r x Pr(X f = 0) + e h x Pr(X; = 1) 

= 1 x q + e r x p 
= pe‘ + q. 

From (26.89), it follows that the MGF for the binomial distribution is given by 

n 

M(t ) = Y[ Mft) = (pe r + q) n . (26.96) 

i= 1 
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We can now use the moment generating function to derive the mean and 
variance of the binomial distribution. From (26.96) 


M'(t) = npe‘{pe t + q) n ~\ 


and from (26.86) 

E[X] = M'( 0) = np(p + qf-' = np, 

where the last equality follows from p + q = 1. 

Differentiating with respect to t once more gives 

M"(t) = e‘(n - 1 )np 2 (pe r + qf~ 2 + e t np(pe r + qf~\ 

and from (26.86) 


E[X 2 ] = M"( 0) = n 2 p 2 - np 2 + np. 


Thus, using (26.87) 

V[X] = M"( 0) — [M'(0)] “ = n 2 p 2 — np 2 + np — n 2 p 2 = np( 1 — p) = npq. 


Multiple binomial distributions 

Suppose X and Y are two independent random variables, both of which are 
described by binomial distributions with a common probability of success p , but 
with (in general) different numbers of trials m and n 2 , so that X Bin(«i,p) 
and Y ~ Bin(« 2 ,p)- Now consider the random variable Z = X + Y . We could 
calculate the probability distribution of Z directly using (26.60), but it is much 
easier to use the MGF (26.96). 

Since X and Y are independent random variables, the MGF Mz(t) of the new 
variable Z = X + Y is given simply by the product of the individual MGFs 
Mx(t) and My(f). Thus, we obtain 

M z (t) = M x (f)M y (f) = (pe‘ + qr(pe > + qf' = (pc' + q) ni+n \ 

which we recognise as the MGF of Z ~ Bin(m +m,p). Flence Z is also described 
by a binomial distribution. 

This result may be extended to any number of binomial distributions. If X h 

i = 1,2 is distributed as X t ~ Bin (n,-,p) then Z = X\ + X 2 H +2fjv is 

distributed as Z ~ Bin(«i + n 2 + ■ ■ ■ + n N , p), as would be expected since the result 
of n, trials cannot depend on how they are split up. A similar proof is also 
possible using either the probability or cumulant generating functions. 

Unfortunately, no equivalent simple result exists for the probability distribution 
of the difference Z = X — Y of two binomially distributed variables. 
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26.8.2 The geometric and negative binomial distributions 

A special case of the binomial distribution occurs when instead of the number of 
successes we consider the discrete random variable 


X = number of trials required to obtain the first success. 

The probability that x trials are required in order to obtain the first success, is 
simply the probability of obtaining x — 1 failures followed by one success. If the 
probability of a success on each trial is p, then for x > 0 

fix) = Pr(X = x) = (1 - p) x ~ ] l p = q x ~ l p, 

where q = 1 — p. This distribution is sometimes called the geometric distribution. 
The probability generating function for this distribution is given in (26.78). By 
replacing f by e f in (26.78) we immediately obtain the MGF of the geometric 
distribution 

n t 


M{t) = 


pe' 


1 — qe r 

from which its mean and variance are found to be 


E[X] = 

P 


V[X ] = 4 . 
p- 


Another distribution closely related to the binomial is the negative binomial 
distribution. This describes the probability distribution of the random variable 


X — number of failures before the rth success. 


One way of obtaining x failures before the rth success is to have r — 1 successes 
followed by x failures followed by the rth success, for which the probability is 

pp - ■ ■ p x qq ■ ■ ■ q x p = p r q x . 
r — 1 times x times 

However, the first r + x — 1 factors constitute just one permutation of r — 1 
successes and x failures. The total number of permutations of these r + x — 1 
objects, of which r — 1 are identical and of type 1 and x are identical and of type 
2, is r+x ~ 1 C x . Therefore, the total probability of obtaining x failures before the 
rth success is 

fix) = Pr(X = x) = r+x ~ 1 C x p r q x , 


which is called the negative binomial distribution (see the related discussion on 
p. 979). It is straightforward to show that the MGF of this distribution is 


M(f) = 
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and that its mean and variance are given by 

E[X] = r -l and Ft*] = \ 
P P~ 


26.8.3 The hypergeometric distribution 

In subsection 26.8.1 we saw that the probability of obtaining x successes in n 
independent trials was given by the binomial distribution. Suppose that these n 
‘trials’ actually consist of drawing at random n balls, from a set of N such balls 
of which M are red and the rest white. Let us consider the random variable 
X = number of red balls drawn. 

On the one hand, if the balls are drawn with replacement then the trials are 
independent and the probability of drawing a red ball is p = M/N each time. 
Therefore, the probability of drawing x red balls in n trials is given by the 
binomial distribution as 


Pr(X = X) = x\(n-x)'. pX{] ~ 

On the other hand, if the balls are drawn without replacement the trials are not 
independent and the probability of drawing a red ball depends on how many red 
balls have already been drawn. We can, however, still derive a general formula 
for the probability of drawing x red balls in n trials, as follows. 

The number of ways of drawing x red balls from M is M C X , and the number 
of ways of drawing n — x white balls from N — M is N ~ M C n - x . Therefore, the 
total number of ways to obtain x red balls in n trials is M C X N ~ M C n - x . However, 
the total number of ways of drawing n objects from N is simply N C„. Hence the 
probability of obtaining x red balls in n trials is 


Pr(A = x) 


Me 1 N—Mf' 

^n—x 

N r 

Ml (N — M ) ! n\(N — n)\ 

x!(M — x) ! (n — x)!(lV — M — n + x) ! N\ 

( Np ) \(Nq ) ! n \(N — n ) ! 
xl(Np — x)l(n — x)\(Nq — n + x)\NV 


(26.97) 

(26.98) 


where in the last line p = M /N and q = 1 — p. This is called the hypergeometric 
distribution. 

By performing the relevant summations directly, it may be shown that the 
hypergeometric distribution has mean 

M 

E[X] = n— = np 


and variance 


r , nM(N — M)(N — n) N-n 

m !"«■ 
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►//? the UK National Lottery each participant chooses six different numbers between 1 
and 49. In each weekly draw six numbered winning balls are subsequently drawn. Find the 
probabilities that a participant chooses 0, 1, 2, 3,4, 5, 6 winning numbers correctly. 


The probabilities are given by a hypergeometric distribution with N (the total number of 
balls) = 49, M (the number of winning balls drawn) = 6, and n (the number of numbers 
chosen by each participant) = 6. Thus, substituting in (26.97), we find 


Pr(0) 

Pr(2) 

Pr(4) 


6 43/'-' 

M) ^6 

1 

Pr( 1 ) = 

6 Cj 43 C 5 

49 c 6 

“ 229’ 

49 c 6 

6 c 2 43 c 4 

1 

Pr(3) = 

6 c 3 43 c 3 

49 c 6 

“ 255’ 

49 c 6 

6 c 4 43 c 2 

1 

Pr(5) = 

6 C 5 43 Cl 

49 c 6 

“ 1032’ 

49 c 6 

Pr(6) 

6 C 6 43 G 

I 

1 

49 c 6 

13.98 x 10 6 ' 


1 

242’ 

1 

5C6’ 

1 

54 200’ 


It can easily be seen that 
6 

]T Pr( i) = 0.44 + 0.41 + 0.13 + 0.02 + 0( 10~ 3 ) = 1, 

!=0 


as expected. ◄ 


Note that if the number of trials (balls drawn) is small compared with N, M 
and N — M then not replacing the balls is of little consequence, and we may 
approximate the hypergeometric distribution by the binomial distribution (with 
p = M /N ) ; this is much easier to evaluate. 


26.8.4 The Poisson distribution 

We have seen that the binomial distribution describes the number of successful 
outcomes in a certain number of trials n. The Poisson distribution also describes 
the probability of obtaining a given number of successes but for situations 
in which the number of ‘trials’ cannot be enumerated; rather it describes the 
situation in which discrete events occur in a continuum. Typical examples of 
discrete random variables X described by a Poisson distribution are the number 
of telephone calls received by a switchboard in a given interval, or the number 
of stars above a certain brightness in a particular area of the sky. Given a mean 
rate of occurrence X of these events in the relevant interval or area, the Poisson 
distribution gives the probability Pr(X = x) that exactly x events will occur. 

We may derive the form of the Poisson distribution as the limit of the binomial 
distribution when the number of trials n —*■ oo and the probability of ‘success’ 
p —* 0, in such a way that np = X remains finite. Thus, in our example of a 
telephone switchboard, suppose we wish to find the probability that exactly x 
calls are received during some time interval, given that the mean number of calls 
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in such an interval is X. Let us begin by dividing the time interval into a large 
number, n, of equal shorter intervals, in each of which the probability of receiving 
a call is p. As we let n —> oo then p —> 0, but since we require the mean number 
of calls in the interval to equal X, we must have np = X. The probability of x 
successes in n trials is given by the binomial formula as 


Pr(X = x) = X , (M W _1 X) , P- X ( 1 - P) n ~ x . (26.99) 

Now as n —* oo, with x finite, the ratio of the n-dependent factorials in (26.99) 
behaves asymptotically as a power of n, i.e. 

Yi ! 

lim 1 — - = lim n(n - 1 )(n — 2) • • • (n — x + 1) ~ n x . 

n—>oD (n — x) ! n— * oo 


Also 


lim lim(l 

n — >oo p—> 0 


p)”~ x 


p -* 0 (1 — p) x 


e 


-x 


1 ’ 


Thus, using X = np, (26.99) tends to the Poisson distribution 


/M = Pr(A = x) = £ 'f , (26.100) 

x! 

which gives the probability of obtaining exactly x calls in the given time interval. 
As we shall show below, X is the mean of the distribution. Events following a 
Poisson distribution are usually said to occur randomly in time. 

Alternatively we may derive the Poisson distribution directly, without consid- 
ering a limit of the binomial distribution. Let us again consider our example 
of a telephone switchboard. Suppose that the probability that x calls have been 
received in a time interval t is P x (t). If the average number of calls received in a 
unit time is X then in a further small time interval At the probability of receiving 
a call is XAt, provided At is short enough that the probability of receiving two or 
more calls in this small interval is negligible. Similarly the probability of receiving 
no call during the same small interval is simply 1 — XAt. 

Thus, for x > 0, the probability of receiving exactly x calls in the total interval 
f + At is given by 


P x (t + At) = P x (t)( 1 — XAt) + P A ._i (t)XAt. 


Rearranging the equation, dividing through by At and letting At — » 
the differential recurrence equation 

dPx{t) , , , p . . 

T7 — 1(0 A,Px\t). 


For x = 0 (i.e. no calls received), however, (26.101) simplifies to 


dPp(t) 

dt 


-XPo(t), 


0, we obtain 


(26.101) 
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which may be integrated to give Po(t) = P 0 (0)e _lf . But since the probability Po(0) 
of receiving no calls in a zero time interval must equal unity, we have Po(t) = e~ Xt . 
This expression for Po(t) may then be substituted back into (26.101) with x = 1 
to obtain a differential equation for P\{t) that has the solution Pi(f) = Xte~ Xt . 
We may repeat this process to obtain expressions for Pjf), P?,(t), . . .,P x (t), and we 
find 

P,(f) = (26.102) 

X! 

By setting t = 1 in (26.102), we again obtain the Poisson distribution (26.100) for 
obtaining exactly x calls in a unit time interval. 

If a discrete random variable is described by a Poisson distribution of mean X 
then we write X ~ Po(2). As it must be, the sum of the probabilities is unity: 

00 X x 

Pr(Y = x) = e~ x V — = e“V = 1. 

x=0 x=0 

From (26.100) we may also derive the Poisson recurrence formula, 

Pr(Y = x + 1) = —~~r Pr(Y = x) for x = 0, 1,2, . . ., 

X + 1 (26.103) 

which enables successive probabilities to be calculated easily once one is known. 



►T person receives on average one e-mail message per half-hour interval. Assuming that 
the e-mails are received randomly in time, find the probabilities that in any particular hour 
0, 1, 2, 3. 4, 5 messages are received. 


Let X = number of e-mails received per hour. Clearly the mean number of e-mails per 
hour is two, and so X follows a Poisson distribution with X = 2, i.e. 

2 X 

Pr(X = x) = —,e~ 2 . 

x! 

Thus Pr(X = 0) = e~ 2 = 0.135, Pr(Y = 1) = 2e~ 2 = 0.271, Pr(Y = 2) = 2V 2 /2! = 0.271, 
Pr(X = 3) = 2V 2 /3! = 0.180, Pr(X = 4) = 2V 2 /4! = 0.090, Pr(Y = 5) = 2 5 e - 2 /5! = 
0.036. These results may also be calculated using the recurrence formula (26.103). ◄ 

The above example illustrates the point that a Poisson distribution typically 
rises and then falls. It either has a maximum when x is equal to the integer part 
of X or, if X happens to be an integer, has equal maximal values at x = X — 1 and 
x = X. The Poisson distribution always has a long ‘tail’ towards higher values of X 
but the higher the value of the mean the more symmetric the distribution becomes. 
Typical Poisson distributions are shown in figure 26.12. Using the definitions of 
mean and variance, we may show that, for the Poisson distribution, E [Y] = X and 
V[X] — X. Nevertheless, as in the case of the binomial distribution, performing 
the relevant summations directly is rather tiresome, and these results are much 
more easily proved using the MGF. 
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fix) fix) 




fix) 



Figure 26.12 Three Poisson distributions for different values of the parame- 
ter 1 


The moment generating function for the Poisson distribution 
The MGF of the Poisson distribution is given by 


Mxit) = E [e tX ] = ]T 


e lx e~ k /. x 


= ^E 


(2e') x 


= e-W = e V-D 


x=0 


x=0 


X! 


(26.104) 


from which we obtain 


M' x {t) = 

M x (t) = (X 2 e 2t + 


Thus, the mean and variance of the Poisson distribution are given by 

E[X] = M' x ( 0) = 2 and V[X] = M"( 0) - [M' x ( 0)] 2 = 2. 


The Poisson approximation to the binomial distribution 
Earlier we derived the Poisson distribution as the limit of the binomial distribution 
when n —* co and p — * 0 in such a way that np = 2 remains finite, where 2 is the 
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mean of the Poisson distribution. It is not surprising, therefore, that the Poisson 
distribution is a very good approximation to the binomial distribution for large 
n (> 50, say) and small p (< 0.1, say). Moreover, it is easier to calculate as it 
involves fewer factorials. 


►/» a large batch of light bulbs, the probability that a bulb is defective is 0.5%. For a 
sample of 200 bulbs taken at random, find the approximate probabilities that 0, 1 and 2 of 
the bulbs respectively are defective. 


Let the random variable X = number of defective bulbs in a sample. This is distributed 
as X ~ Bin(200, 0.005), implying that X = np = 1.0. Since n is large and p small, we may 
approximate the distribution as X ~ Po(l), giving 

\x 

Pr(Z = x)*e“ 1 — , 
x! 

from which we find Pr(X = 0) « 0.37, Pr(A = 1 ) « 0.37, Pr(A = 2) « 0.18. For comparison, 
it may be noted that the exact values calculated from the binomial distribution are identical 
to those found here to two decimal places. ◄ 

Multiple Poisson distributions 

Mirroring our discussion of multiple binomial distributions in subsection 26.8.1, 
let us suppose X and Y are two independent random variables, both of which 
are described by Poisson distributions with (in general) different means, so that 
X ~ Po(li) and Y ~ Po(/U). Now consider the random variable Z = X + Y . We 
may calculate the probability distribution of Z directly using (26.60), but we may 
derive the result much more easily by using the moment generating function (or 
indeed the probability or cumulant generating functions). 

Since X and Y are independent RVs, the MGF for Z is simply the product of 
the individual MGFs for X and Y . Thus, from (26.104), 

M z (t) = M x (t)M Y (t) = e h(e'-i) e W-p = fh +W-B 

which we recognise as the MGF of Z ~ Po(Ai + Xf). Hence Z is also Poisson 
distributed and has mean ).\ + X^. Unfortunately, no such simple result holds for 
the difference Z = X — Y of two independent Poisson variates. A closed-form 
expression for the PDF of this Z does exist, but it is a rather complicated 
combination of exponentials and a modified Bessel function. f 


► 7wo types of e-mail arrive independently and at random: external e-mails at a mean rate 
of one every five minutes and internal e-mails at a rate of two every five minutes. Calculate 
the probability of receiving two or more e-mails in any two-minute interval. 

Let 

X = number of external e-mails per two-minute interval, 

Y = number of internal e-mails per two-minute interval. 


f For a derivation see, for example, Hobson & Lasenby, Monthly Notices of the Royal Astronomical 
Society, 298, 905 (1998). 
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Distribution 

Gaussian 

exponential 

gamma 

chi-squared 

uniform 


Probability law / (x) 

1 

exp 

tr \J2 tz 
Xe- Xx 

-^-a x y- l e- lx 

F(r) 

1 v («/ 2)— 1 - x /2 

2"/ 2 r( fi /2) 

1 

b — a 


(x ~ P)' 
2 a 2 


MGF 


exp(/(f + 7 <r 2 f 2 ) 



(b — a)t 


E[X] 

/' 

1 

I 

r 

1 

n 

a + b 

2 


F[X] 


1 

I 2 

r 

I 2 


2n 

(■ b-a ) 2 

12 


Table 26.2 Some important continuous probability distributions. 


Since we expect on average one external e-mail and two internal e-mails every five minutes 
we have X ~ Po(0.4) and Y ~ Po(0.8). Letting Z = X + Y we have Z ~ Po(0.4 + 0.8) = 
Po( 1.2). Now 

Pr(Z > 2) = 1 - Pr(Z < 2) = 1 - Pr(Z = 0) - Pr(Z = 1) 

and 

Pr(Z = 0) = e- 1 ' 2 = 0.301, 

Pr(Z = {) = e - l2] YX =0.361. 

Hence Pr(Z > 2) = 1 - 0.301 - 0.361 = 0.338. ◄ 

The above result can be extended, of course, to any number of Poisson processes, 
so that if Xj = Po(l ; ), i — 1,2 ,...,« then the random variable Z = X\ + Xi + 
• • • + X n is distributed as Z ~ Po(2i + b + ■ ■ ■ + A„). 


26.9 Important continuous distributions 

Having discussed the most commonly encountered discrete probability distri- 
butions, we now consider some of the more important continuous probability 
distributions. These are summarised for convenience in table 26.2; we refer the 
reader to the relevant subsection below for an explanation of the symbols used. 


26.9.1 The Gaussian distribution 

By far the most important continuous probability distribution is the Gaussian 
or normal distribution. The reason for its importance is that a great many 
random variables of interest, in all areas of the physical sciences and beyond, are 
described either exactly or approximately by a Gaussian distribution. Moreover, 
the Gaussian distribution can be used to approximate other, more complicated, 
probability distributions. 
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Figure 26.13 The Gaussian or normal distribution for mean /( = 3 and 
various values of the standard deviation a. 


The probability density function for a Gaussian distribution of a random 
variable X, with mean E [X] = and variance V [X] = a 2 , takes the form 

/(*)=— 7 = exp — -) • (26.105) 

ft \J2ti L ^ v a 2 

The factor arises from the normalisation of the distribution. 



the evaluation of this integral is discussed in subsection 6.4.2. The Gaussian 
distribution is symmetric about the point x = n and has the characteristic ‘bell’ 
shape shown in figure 26.13. The width of the curve is described by the standard 
deviation a : if a is large then the curve is broad, and if a is small then the curve 
is narrow (see the figure). At x = /( + a, f(x) falls to e -1 ^ 2 » 0.61 of its peak 
value; these points are points of inflection, where d 2 f /dx 2 = 0. When a random 
variable X follows a Gaussian distribution with mean and variance a 2 , we write 
X ~ N(n,t 7 2 ). 

The effects of changing ji and a are only to shift the curve along the x-axis or 
to broaden or narrow it, respectively. Thus all Gaussians are equivalent in that 
a change of origin and scale can reduce them to a standard form. We therefore 
consider the random variable Z = (X — j-i)/ a, for which the PDF takes the form 



(26.106) 


which is called the standard Gaussian distribution and has mean /< = 0 and 
variance a 2 = 1. The random variable Z is called the standard variable. 
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Figure 26.14 On the left, the standard Gaussian distribution the shaded 
area gives Pr(Z < a) = ® (a). On the right, the cumulative probability function 
®(z) for a standard Gaussian distribution 


From (26.105) we can define the cumulative probability function for a Gaussian 
distribution as 


F{x) = Pr(X < x ) = 



1 / u — /( \ 2 

2 V (j ) 


du. 


(26.107) 


where u is a (dummy) integration variable. Unfortunately, this (indefinite) integral 
cannot be evaluated analytically. It is therefore standard practice to tabulate val- 
ues of the cumulative probability function for the standard Gaussian distribution 
(see figure 26.14), i.e. 


O(z) = Pr(Z < z) = 



du. 


(26.108) 


It is usual only to tabulate d>(z) for z > 0, since it can be seen easily, from 
figure 26.14 and the symmetry of the Gaussian distribution, that <1>(— z) = 1— <l>(z); 
see table 26.3. Using such a table it is then straightforward to evaluate the 
probability that Z lies in a given range of z-values. For example, for a and b 
constant, 


Pr(Z < a) = d>(«), 

Pr(Z > a) = 1 — <E>(a), 

Pr(a < Z < b) = <D (b) - <h(a). 

Remembering that Z = (X — fi)/a and comparing (26.107) and (26.108), we see 
that 

and so we may also calculate the probability that the original random variable 
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O(z) 

.00 

.01 

.02 

.03 

.04 

.05 

.06 

.07 

.08 

.09 

0.0 

.5000 

.5040 

.5080 

.5120 

.5160 

.5199 

.5239 

.5279 

.5319 

.5359 

0.1 

.5398 

.5438 

.5478 

.5517 

.5557 

.5596 

.5636 

.5675 

.5714 

.5753 

0.2 

.5793 

.5832 

.5871 

.5910 

.5948 

.5987 

.6026 

.6064 

.6103 

.6141 

0.3 

.6179 

.6217 

.6255 

.6293 

.6331 

.6368 

.6406 

.6443 

.6480 

.6517 

0.4 

.6554 

.6591 

.6628 

.6664 

.6700 

.6736 

.6772 

.6808 

.6844 

.6879 

0.5 

.6915 

.6950 

.6985 

.7019 

.7054 

.7088 

.7123 

.7157 

.7190 

.7224 

0.6 

.7257 

.7291 

.7324 

.7357 

.7389 

.7422 

.7454 

.7486 

.7517 

.7549 

0.7 

.7580 

.7611 

.7642 

.7673 

.7704 

.7734 

.7764 

.7794 

.7823 

.7852 

0.8 

.7881 

.7910 

.7939 

.7967 

.7995 

.8023 

.8051 

.8078 

.8106 

.8133 

0.9 

.8159 

.8186 

.8212 

.8238 

.8264 

.8289 

.8315 

.8340 

.8365 

.8389 

1.0 

.8413 

.8438 

.8461 

.8485 

.8508 

.8531 

.8554 

.8577 

.8599 

.8621 

1.1 

.8643 

.8665 

.8686 

.8708 

.8729 

.8749 

.8770 

.8790 

.8810 

.8830 

1.2 

.8849 

.8869 

.8888 

.8907 

.8925 

.8944 

.8962 

.8980 

.8997 

.9015 

1.3 

.9032 

.9049 

.9066 

.9082 

.9099 

.9115 

.9131 

.9147 

.9162 

.9177 

1.4 

.9192 

.9207 

.9222 

.9236 

.9251 

.9265 

.9279 

.9292 

.9306 

.9319 

1.5 

.9332 

.9345 

.9357 

.9370 

.9382 

.9394 

.9406 

.9418 

.9429 

.9441 

1.6 

.9452 

.9463 

.9474 

.9484 

.9495 

.9505 

.9515 

.9525 

.9535 

.9545 

1.7 

.9554 

.9564 

.9573 

.9582 

.9591 

.9599 

.9608 

.9616 

.9625 

.9633 

1.8 

.9641 

.9649 

.9656 

.9664 

.9671 

.9678 

.9686 

.9693 

.9699 

.9706 

1.9 

.9713 

.9719 

.9726 

.9732 

.9738 

.9744 

.9750 

.9756 

.9761 

.9767 

2.0 

.9772 

.9778 

.9783 

.9788 

.9793 

.9798 

.9803 

.9808 

.9812 

.9817 

2.1 

.9821 

.9826 

.9830 

.9834 

.9838 

.9842 

.9846 

.9850 

.9854 

.9857 

2.2 

.9861 

.9864 

.9868 

.9871 

.9875 

.9878 

.9881 

.9884 

.9887 

.9890 

2.3 

.9893 

.9896 

.9898 

.9901 

.9904 

.9906 

.9909 

.9911 

.9913 

.9916 

2.4 

.9918 

.9920 

.9922 

.9925 

.9927 

.9929 

.9931 

.9932 

.9934 

.9936 

2.5 

.9938 

.9940 

.9941 

.9943 

.9945 

.9946 

.9948 

.9949 

.9951 

.9952 

2.6 

.9953 

.9955 

.9956 

.9957 

.9959 

.9960 

.9961 

.9962 

.9963 

.9964 

2.7 

.9965 

.9966 

.9967 

.9968 

.9969 

.9970 

.9971 

.9972 

.9973 

.9974 

2.8 

.9974 

.9975 

.9976 

.9977 

.9977 

.9978 

.9979 

.9979 

.9980 

.9981 

2.9 

.9981 

.9982 

.9982 

.9983 

.9984 

.9984 

.9985 

.9985 

.9986 

.9986 

3.0 

.9987 

.9987 

.9987 

.9988 

.9988 

.9989 

.9989 

.9989 

.9990 

.9990 

3.1 

.9990 

.9991 

.9991 

.9991 

.9992 

.9992 

.9992 

.9992 

.9993 

.9993 

3.2 

.9993 

.9993 

.9994 

.9994 

.9994 

.9994 

.9994 

.9995 

.9995 

.9995 

3.3 

.9995 

.9995 

.9995 

.9996 

.9996 

.9996 

.9996 

.9996 

.9996 

.9997 

3.4 

.9997 

.9997 

.9997 

.9997 

.9997 

.9997 

.9997 

.9997 

.9997 

.9998 


Table 26.3 The cumulative probability function <J>(r) for the standard Gaus- 
sian distribution, as given by (26.108). The units and the first decimal place 
of z are specified in the column under <l>(z) and the second decimal place is 
specified by the column headings. Thus, for example, 0(1.23) = 0.8907. 
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X lies in a given x-range. For example. 


Pr(a < X < b) = 


1 


rb 


<7yj2n 

— F(b) — F(a) 


exp 


1 / u — fi \ 2 

2 V er ) 


du 


= O 




(26.109) 

(26.110) 
(26.111) 


► //" X is described by a Gaussian distribution of mean p and variance a 2 , calculate the 
probabilities that X lies within 1 a, 2ct and 3(7 of the mean. 


From (26.111) 

Pr(/( — no < X < p + no) = ®(n) — ®(— n) = ®(n) — [1 — ®(n)], 
and so from table 26.3 

Pr(/i -o < X < p + o) = 20(1) - 1 = 0.6826 * 68.3%, 

Pr(/< -2o <X <p + 2o) = 20(2) - 1 = 0.9544 * 95.4%, 

Pr(/< - 3(7 < X < + 3 ct) = 20(3) - 1 = 0.9974 * 99.7%. 

Thus we expect X to be distributed in such a way that about two thirds of the values will 
lie between fi — o and p + o, 95% will lie within 2 o of the mean and 99.7% will lie within 
3(7 of the mean. These limits are called the one-, two- and three-sigma limits respectively; 
it is particularly important to note that they are independent of the actual values of the 
mean and variance. ◄ 

There are many other ways in which the Gaussian distribution may be used. 
We now illustrate some of the uses in more complicated examples. 


► Sawmill A produces boards whose lengths are Gaussian distributed with mean 209.4 cm 
and standard deviation 5.0 cm. A board is accepted if it is longer than 200 cm but is 
rejected otherwise. Show that 3% of boards are rejected. 

Sawmill B produces boards of the same standard deviation but of mean length 210.1 cm. 
Find the proportion of boards rejected if they are drawn at random from the outputs of A 
and B in the ratio 3:1. 


Let X = length of boards from A, so that X ~ JV(209.4, (5.0) 2 ) and 


Pr(X < 200) = O 


200 - 


= O 


200 - 209.4 
5X) 


= 0 ( — 1 . 88 ). 


But, since ®(— z) = 1 — O(c) we have, using table 26.3, 

Pr(X < 200) = 1 - 0(1.88) = 1 - 0.9699 = 0.0301, 
i.e. 3.0% of boards are rejected. 

Now let Y = length of boards from B, so that Y ~ iV(210. 1, (5.0) 2 ) and 


Pr(7 < 200) = O 


200 - 210.1 


5.0 

= 1 - 0 ( 2 . 02 ) 

= 1 -0.9783 = 0.0217. 


= ®(— 2 . 02 ) 
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Therefore, when taken alone, only 2.2% of boards from B are rejected. If, however, boards 
are drawn at random from A and B in the ratio 3 : 1 then the proportion rejected is 

i(3 x 0.030 + 1 x 0.022) = 0.028 = 2.8%. ◄ 

We may sometimes work backwards to derive the mean and standard deviation 
of a population that is known to be Gaussian distributed. 


► The time taken for a computer ‘packet’ to travel from Cambridge UK to Cambridge MA 
is Gaussian distributed. 6.8% of the packets take over 200 ms to make the journey, and 
3.0% take under 140 ms. Find the mean and standard deviation of the distribution. 


Let X = journey time in ms; we are told that X ~ N(p, a 1 ) where p and a are unknown. 
Since 6.8% of journey times are longer than 200 ms, 

' 200 — p N 


Pr(X > 200) = 1 0 


= 0.068, 


from which we find 




200 - p 


= 1 - 0.068 = 0.932. 


Using table 26.3, we have therefore 


200 — p 


= 1.49. 


Also, 3.0% of journey times are under 140 ms, so 
Pr(X < 140) = ® 

Now using ®(— z) = 1 — <D(z) gives 

' p — 140 


= 0.030. 




Using table 26.3 again, we find 


= 1 - 0.030 = 0.970. 


p — 140 


= 1 . 88 . 


(26.112) 


(26.113) 


Solving the simultaneous equations (26.112) and (26.113) gives p = 173.5, a = 17.8. ◄ 


The moment generating function for the Gaussian distribution 
Using the definition of the MGF (26.85), 


M x (t) = E \e ,x ] = 


1 


exp 


tx — 


(x-fi) 

2a 1 


21 


dx 


= cexp (/if + \a 2 t 2 ) , 


where the final equality is established by completing the square in the argument 
of the exponential and writing 


1 


c = 


a^fht 


exp 


[.x — (p + a 2 t)] 2 
2a 2 


dx. 
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However, the final integral is simply the normalisation integral for the Gaussian 
distribution, and so c = 1 and the MGF is given by 

Mx(t) = exp (/if + \ a 2 t 2 ) . (26.114) 

We showed in subsection 26.7.2 that this MGF leads to E[X] = p and V[X] = a 2 , 
as required. 


Gaussian approximation to the binomial distribution 
We may consider the Gaussian distribution as the limit of the binomial distribu- 
tion when the number of trials n — * oo but the probability of a success p remains 
finite, so that np — * oo also. (This contrasts with the Poisson distribution, which 
corresponds to the limit n —* oo and p — > 0 with np = X remaining finite.) In 
other words, a Gaussian distribution results when an experiment with a finite 
probability of success is repeated a large number of times. We now show how 
this Gaussian limit arises. 

The binomial probability function gives the probability of x successes in n trials 
as 

/<*)= „ n! / (i-pr. 

x!(n — x)! 

Taking the limit as n — * oo (and x — *• oo) we may approximate the factorials by 
Stirling’s approximation 



to obtain 


f(x) 


U(W““ , / 2 (^r + “- 1, V ( i-p ) " 

2nn sn/ \ n / 

exp 


n — x 


(x + i) In ; (n — x + i) In 

\ a n \ a/ n 


+ x In p + (n — x) ln( 1 — p)\ . 


By expanding the argument of the exponential in terms of y = x — np, where 
l<y< np and keeping only the dominant terms, it can be shown that 

1 (x — np) 2 
2np(l-p) ’ 

which is of Gaussian form with p = np and er = s jnp( 1 — p). 

Thus we see that the value of the Gaussian probability density function f(x) is 
a good approximation to the probability of obtaining x successes in n trials. This 
approximation is actually very good even for relatively small n. For example, if 
n — 10 and p — 0.6 then the Gaussian approximation to the binomial distribution 
is (26.105) with p = 10 x 0.6 = 6 and a = ^/lO x 0.6(1 — 0.6) = 1.549. The 


fix) 


,/lnn Jp(l-p) 


exp 
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X 

f(x) (binomial) 

fix) (Gaussian) 

0 

0.0001 

0.0001 

l 

0.0016 

0.0014 

2 

0.0106 

0.0092 

3 

0.0425 

0.0395 

4 

0.1115 

0.1119 

5 

0.2007 

0.2091 

6 

0.2508 

0.2575 

7 

0.2150 

0.2091 

8 

0.1209 

0.1119 

9 

0.0403 

0.0395 

10 

0.0060 

0.0092 


Table 26.4 Comparison of the binomial distribution for n = 10 and p = 0.6 
with its Gaussian approximation. 


probability functions f(x) for the binomial and associated Gaussian distributions 
for these parameters are given in table 26.4, and it can be seen that the Gaussian 
approximation is a good one. 

Strictly speaking, however, since the Gaussian distribution is continuous and 
the binomial distribution is discrete, we should use the integral of f(x) for the 
Gaussian distribution in the calculation of approximate binomial probabilities. 
More specifically, we should apply a continuity correction so that the discrete 
integer x in the binomial distribution becomes the interval [x — 0.5, x + 0.5] in 
the Gaussian distribution. Explicitly, 


pjt+0.5 


Pr(X = x) : 


exp 


-0.5 




du. 


The Gaussian approximation is particularly useful for estimating the binomial 
probability that X lies between the (integer) values x\ and xi. 


Pr(xi <X <xt) 



r x 2+0.5 


exp 


/ xi -0.5 


1 / m — // \ 2 

2 V er ) 


du. 


►T manufacturer makes computer chips of which 10% are defective. For a random sample 
of 200 chips, find the approximate probability that more than 15 are defective. 

We first define the random variable 

X = number of defective chips in the sample, 

which has a binomial distribution X ~ Bin(200, 0.1). Therefore, the mean and variance of 
this distribution are 

E[X] = 200 x 0.1 = 20 and V[X] = 200 x 0.1 x (1 - 0.1) = 18, 
and we may approximate the binomial distribution with a Gaussian distribution such that 


1028 




26.9 IMPORTANT CONTINUOUS DISTRIBUTIONS 


X — ■ 2V( 20, IS). The standard variable is 


Z = 


X — 20 

•v/18 ’ 


and so, using X = 15.5 to allow for the continuity correction, 

15.5 — 20 N 


Pr(X > 15.5) = Pr Z > 


>/l8 

= Pr(Z < 1.06) = 0.86. ◄ 


= Pr(Z > —1.06) 


Gaussian approximation to the Poisson distribution 

We first met the Poisson distribution as the limit of the binomial distribution for 
n — ► go and p — * 0, taken in such a way that np = X remains finite. Further, in 
the previous subsection, we considered the Gaussian distribution as the limit of 
the binomial distribution when n —* oo but p remains finite, so that np —* oo also. 
It should come as no surprise, therefore, that the Gaussian distribution can also 
be used to approximate the Poisson distribution when the mean X becomes large. 
The probability function for the Poisson distribution is 


f(x) = e 




which, on taking the logarithm of both sides, gives 

In f(x) = — X + x In X — In x !. 
Stirling’s approximation for large x gives 


(26.115) 


X! 


xy 

eJ 


•x, 


implying that 

In x ! « In y/2nx + x In x 
which, on substituting into (26.115), yields 

In /(x) « — X + x In X — (x In x — x) — In 


TLX. 


Since we expect the Poisson distribution to peak around x = X, we substitute 
e = x — X to obtain 


ln/(.x) « — X + (X + e) | In A — In 
Using the expansion ln(l + z) = z — z 2 /2 


^1 + — ^ j- + (X + e) — In \/2tc(X + e). 
• , we find 


ln/(x) »e-(A + e)(^-^)-ln yflnX - ( € - - ^ 


X 2A 2 


X 2X 2 


— ( -t — In sj2itX, 
2/1 
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when only the dominant terms are retained, after using the fact that e is of the 
order of the standard deviation of x, i.e. of order l 1 / 2 . On exponentiating this 
result we obtain 


fix) 



(x — A) 2 
2 A 


which is the Gaussian distribution with p = A and <r 2 = A. 

The larger the value of A, the better is the Gaussian approximation to the 
Poisson distribution; the approximation is reasonable even for 1 = 5, but A > 10 
is safer. As in the case of the Gaussian approximation to the binomial distribution, 
a continuity correction is necessary since the Poisson distribution is discrete. 


>-E-mail messages are received by an author at an average rate of one per hour. Find the 
probability that in a day the author receives 24 messages or more. 


We first define the random variable 

X = number of messages received in a day. 

Thus E[X] = 1 x 24 = 24, and so X ~ Po(24). Since 2 > 10 we may approximate the 
Poisson distribution by X ~ N(24, 24). Now the standard variable is 

X - 24 
x/24 

and, using the continuity correction, we find 

/ 23 5 — 24 

Pr(X > 23.5) = P [Z > 

V V24 

= Pr(Z > -0.102) = Pr(Z < 0.102) = 0.54. ◄ 


In fact, almost all probability distributions tend towards a Gaussian when the 
numbers involved become large - that this should happen is required by the 
central limit theorem, which we discuss in section 26.10. 

Multiple Gaussian distributions 

Suppose X and Y are independent Gaussian-distributed random variables, so 
that X ~ N(p i,<r 2 ) and Y ~ N(p 2 , <x 2 )- Let us now consider the random variable 
Z = X + Y . The PDF for this random variable may be found directly using 
(26.61), but it is easier to use the MGF. From (26.114), the MGFs of X and Y 
are 

M x (t ) = exp (pit + 5 <x 2 f 2 ) , My(f) = exp ( p 2 t + \a\t 2 ) . 

Using (26.89), since X and Y are independent RVs, the MGF of Z = X + Y is 
simply the product of M x (t) and M Y (t). Thus, we have 

M z (t) = M z (t)My(t) = exp (pit + yffif 2 ) exp (p 2 t + j^jt 2 ) 

= exp [(pi + p 2 )t + i(cr 2 + cr|)f 2 ] , 
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which we recognise as the MGF for a Gaussian with mean p\ + p 2 and variance 
<j\ + a 2 . Thus, Z is also Gaussian distributed: Z ~ N(p 1 + pi, + cf). 

A similar calculation may be performed to calculate the PDF of the random 
variable W = X — Y . If we introduce the variable Y = — Y then W = X + Y, 
where Y ~ N(—p 1 , nj). Thus, using the result above, we find W iV(/R - 
H2, a\ + a%). 


►fin executive travels home from her office every evening. Her journey consists of a train 
ride, followed by a bicycle ride. The time spent on the train is Gaussian distributed with 
mean 52 minutes and standard deviation 1.8 minutes, while the time for the bicycle journey 
is Gaussian distributed with mean 8 minutes and standard deviation 2.6 minutes. Assuming 
these two factors are independent, estimate the percentage of occasions on which the whole 
journey takes more than 65 minutes. 


We first define the random variables 

X = time spent on train, Y = time spent on bicycle, 

so that IV(52, (1.8) 2 ) and Y ~ iV(8, (2.6) 2 ). Since X and Y are independent, the total 
journey time T = X + Y is distributed as 

T ~ N( 52 + 8, ( 1.8 ) 2 + (2.6) 2 ) = AT(60, (3.16) 2 ). 

The standard variable is thus 

T - 60 
Z ~ 3.16 ’ 

and the required probability is given by 

Pr(T > 65) = Pr (z > = Pr(Z > 1.58) = 1 - 0.943 = 0.057. 

Thus the total journey time exceeds 65 minutes on 5.7% of occasions. ◄ 

The above results may be extended. For example, if the random variables 
Xj, i = 1,2 ,...,«, are distributed as X t ~ /V(/<,-, <rf) then the random variable 
Z = ffi c;Xj (where the c,- are constants) is distributed as Z N(J2jCiHi, E i c ?G 2 )- 


26.9.2 The log-normal distribution 


If the random variable X follows a Gaussian distribution then the variable 
Y = e x is described by a log-normal distribution. Clearly, if X can take values 
in the range —00 to 00 , then Y will lie between 0 and 00 . The probability density 
function for Y is found using the result (26.58). It is 


g(y) = f(x{y)) 


dx 

dy 


1 1 T (In v- u) 2 

^v exp r~ 2^ 


We note that p and a 2 are not the mean and variance of the log-normal 
distribution, but rather the parameters of the corresponding Gaussian distribution 
for X. The mean and variance of Y , however, can be found straightforwardly 
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Figure 26.15 The PDF g(y) for the log-normal distribution for various values 
of the parameters /( and a. 


using the MGF of X, which reads M x (t ) = E[e rX ] = exp(/tf + \c 2 t 2 ). Thus, the 
mean of Y is given by 

E[Y] = E[e x ] = M x { 1) = exp(/t + ^a 2 ), 

and the variance of Y reads 

V[Y] = E[Y 2 ] - (E[Y]) 2 = E[e 2X ] - (E[e x ]f 

= M x ( 2) - [M x ( l)] 2 = exp(2/< + c- 2 )[exp(tx 2 ) - 1], 

In hgure 26.15, we plot some examples of the log-normal distribution for various 
values of the parameters /( and a 2 . 


26.9.3 The exponential and gamma distributions 


The exponential distribution with positive parameter A is given by 


fix) = 


ke~ 2x 

0 


for x > 0, 
for x < 0 


( 26 . 116 ) 


and satisfies f ' r f (x) dx = 1 as required. The exponential distribution occurs nat- 
urally if we consider the distribution of the length of intervals between successive 
events in a Poisson process or, equivalently, the distribution of the interval (i.e. 
the waiting time) before the first event. If the average number of events per unit 
interval is A then on average there are Ax events in interval x, so that from the 
Poisson distribution the probability that there will be no events in this interval is 
given by 

Pr(no events in interval x) = e~ 2x . 
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The probability that an event occurs in the next inhnitestimal interval [x, x + dx ] 
is given by A dx, so that 

Pr(the first event occurs in interval [x,x + dx]) = e~'' x a dx. 


Hence the required probability density function is given by 

f(x) = hr**. 


The expectation and variance of the exponential distribution can be evaluated as 
1/A and (1/A) 2 respectively. The MGF is given by 


M(t) = 


A — t 


(26.117) 


We may generalise the above discussion to obtain the PDF for the interval 
between every rth event in a Poisson process or, equivalently, the interval (waiting 
time) before the rth event. We begin by using the Poisson distribution to give 


Pr(r — 1 events occur in interval x) = e 


-xx (^r 1 

(r-l)F 


from which we obtain 


(AxV -1 

Pr(rth event occurs in the interval [x,x + dx]) = — A dx. 

(r-1)! 


Thus the required PDF is 


fix) = 


A 


(r-1)! 


(Xx) r - l e~ Xx , 


(26.118) 


which is known as the gamma distribution of order r with parameter A. Although 
our derivation applies only when r is a positive integer, the gamma distribution is 
defined for all positive r by replacing (r — 1)! by T(r) in (26.118); see the appendix 
for a discussion of the gamma function T(x). If a random variable X is described 
by a gamma distribution of order r with parameter A, we write X y(V); 
we note that the exponential distribution is the special case y(A, 1). The gamma 
distribution y(A, r) is plotted in figure 26.16 for A = 1 and r = 1,2,5,10. For 
large r , the gamma distribution tends to the Gaussian distribution whose mean 
and variance are specified by (26.120) below. 

The MGF for the gamma distribution is obtained from that for the exponential 
distribution, by noting that we may consider the interval between every rth event 
in a Poisson process as the sum of r intervals between successive events. Thus the 
rth-order gamma variate is the sum of r independent exponentially distributed 
random variables. From (26.117) and (26.90), the MGF of the gamma distribution 
is therefore given by 


M(t) = 


A 

A — t 


r 


(26.119) 
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Figure 26.16 The PDF f(x) for the gamma distributions y(A, r) with 2=1 
and r = 1,2, 5, 10. 


from which the mean and variance are found to be 


E[X] = ", V[X] = (26.120) 

A A 1 

We may also use the above MGF to prove another useful theorem regarding 
multiple gamma distributions. If X t ~ y(A,rj), i = 1,2,..., n, are independent 
gamma variates then the random variable Y = X\ + X2 H + X n has MGF 


M(t ) = n 

;= 1 


A 


r i 


A-t 


ri+T2-\ hr„ 


(26.121) 


Thus Y is also a gamma variate, distributed as Y y(A,r 1 +r 2 H F r„). 


26.9.4 The chi-squared distribution 

In subsection 26.6.2, we showed that if X is Gaussian distributed with mean /t and 
variance er 2 , such that X ~ N(/i,c r 2 ), then the random variable Y = (x — /r ) 2 /cr 2 
is distributed as the gamma distribution Y ~ Let us now consider n 

independent Gaussian random variables Xj ~ /V(/z,, of), i = 1,2, ...,n, and define 
the new variable 

ll-jz {Xi ~S i)2 - (26.122) 

tr 
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Using the result (26.121) for multiple gamma distributions, XT, must be distributed 
as the gamma variate y} n y(j, jn), which from (26.118) has the PDF 

f(x 2 „) = J-(k») (n/2) “ lex P(-k») 

1 ( 2 ^) 

= ,, 1 1 (X 2 „) {n/2) ~ 1 eM-hl)- (26.123) 

2"/ 2 r(in) " 2 

This is known as the chi-squared distribution of order n and has numerous 
applications in statistics (see chapter 27). Setting X = \ and r = \n in (26.120), 
we find that 

E[x 2 „]=n, F[z 2 ]=2 n. 

An important generalisation occurs when the n Gaussian variables X t are trot 
linearly independent but are instead required to satisfy a linear constraint of the 
form 


c\X\ + C 2 X 2 + • • ■ + c n X n — 0, (26.124) 

in which the constants c,- are not all zero. In this case, it may be shown (see 
exercise 26.40) that the variable XT, defined in (26.122) is still described by a chi- 
squared distribution, but one of order n — 1 . Indeed, this result may be trivially 
extended to show that if the n Gaussian variables Xj satisfy m linear constraints 
of the form (26.124) then the variable y 2 defined in (26.122) is described by a 
chi-squared distribution of order n — m. 


26.9.5 The Cauchy and Breit-Wigner distributions 


A random variable X (in the range —00 to 00) that obeys the Cauchy distribution 
is described by the PDF 


f(x) = 


1 1 

n 1 + x 2 


This is a special case of the Breit-Wigner distribution 


1 ir 
fW = T U ' 


n ir 2 + (x-Xo) 2 ’ 


which is encountered in the study of nuclear and particle physics. In figure 26.17, 
we plot some examples of the Breit-Wigner distribution for several values of the 
parameters xo and T. 

We see from the figure that the peak (or mode) of the distribution occurs 
at x = xo- It is also straightforward to show that the parameter T is equal to 
the width of the peak at half the maximum height. Although the Breit-Wigner 
distribution is symmetric about its peak, it does not formally possess a mean since 
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Figure 26.17 The PDF /(.x) for the Breit-Wigner distribution for different 
values of the parameters xo and T. 


the integrals xf (x) dx and xf (x) dx both diverge. Similar divergences occur 

for all higher moments of the distribution. 


26.9.6 The uniform distribution 

Finally we mention the very simple, but common, uniform distribution, which 
describes a continuous random variable that has a constant PDF over its allowed 
range of values. If the limits on X are a and b then 


fix) = 


1 /(b — a) 
0 


for a < x < b, 
otherwise. 


The MGF of the uniform distribution is found to be 


M(t) = 



( b — a)t ’ 


and its mean and variance are given by 


E[X\ = 


a + b 
2 


V[X] = 


(b — a) 2 
12 


26.10 The central limit theorem 

In subsection 26.9.1 we discussed approximating the binomial and Poisson distri- 
butions by the Gaussian distribution when the number of trials is large. We now 
discuss why the Gaussian distribution is so common and therefore so important. 
The central limit theorem may be stated as follows. 
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Central limit theorem. Suppose that X- t , i = 1,2, are independent random 
variables, each of which is described by a probability density function ffx ) ( these 
may cdl be different) with a mean pi and a variance of. The random variable Z = 
(JT i Xf) /n, i.e. the 'mean’ of the Xj, has the following properties: 

(i) its expectation value is given by E[Z ] = (Xw/f) / n ; 

(ii) its variance is given by V[Z] = (JLof) A j2 > 

(iii) as n —*■ oo the probability function of Z tends to a Gaussian with corre- 
sponding mean and variance. 


We note that for the theorem to hold, the probability density functions ffx) 
must possess formal means and variances. Thus, for example, if each X t were 
described by a Cauchy distribution then the theorem would not apply. 

Properties (i) and (ii) of the theorem are easily proved, as follows. Firstly 

E[Z] = ~(E[X i] + E[X 2 ] + • • • + E[X„]) = -(/(i + p 2 + • • • + Pn ) = ^~' 1 ~ , 
n n n 

a result which does not require that the Xj are independent random variables. If 

Pi = p for all i then this becomes 

E[Z] = — =/(. 
n 

Secondly, if the X t are independent, it follows from an obvious extension of 
(26.68) that 


V[Z] = V 


-(Xi + X2 + ■ ■ ■ + X n ) 
n 


— —5 (V[Xi] + V [X 2 ] + • • ■ + V[X„\) — 
n z 



Let us now consider property (iii), which is the reason for the ubiquity of 
the Gaussian distribution and is most easily proved by considering the moment 
generating function Mz(t) of Z. From (26.90), this MGF is given by 


XI z it ) = ]^[ Mj. 

i= 1 



where Mxft ) is the MGF of ffx). Now 


M Xl 


and as n becomes large 


= 1 


-E[Xi\ + \ 2 E[X\ 
1 z n z 


— 1 + Pi h i(crr 


M x , - 

n 


exp 


Pit 


lit 

5 '■ n 2 
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as may be verified by expanding the exponential up to terms including ( t/n ) 2 . 
Therefore 


n 

M z (t ) - ] [ exp 
i= 1 



l -(7 2 
2 i 



= exp 


Tn Hi 
n 


t 


+ 


1 Ei E t i\ 

2 n~ )' 


Comparing this with the form of the MGF for a Gaussian distribution, ( 26 . 114 ), 
we can see that the probability density function g(z ) of Z tends to a Gaussian dis- 
tribution with mean hi/ n and variance JT aj/n 1 . In particular, if we consider 
Z to be the mean of n independent measurements of the same random variable X 
(so that Xj = X for i = 1, 2, , n) then, as n — > oo, Z has a Gaussian distribution 
with mean p and variance a 2 /n. 

We may use the central limit theorem to derive an analogous result to (iii) 
above for the product W = X{X 2 ■ ■ ■ X n of the n independent random variables 
Xj . Provided the X t only take values between zero and infinity, we may write 


In W = lnX t + lnX 2 + • • • + \nX„, 


which is simply the sum of n new random variables InXj. Thus, provided these 
new variables each possess a formal mean and variance, the PDF of In W will 
tend to a Gaussian in the limit n — > oo, and so the product W will be described 
by a log-normal distribution (see subsection 26.9.2). 


26.11 Joint distributions 

As mentioned briefly in subsection 26.4.3, it is common in the physical sciences to 
consider simultaneously two or more random variables that are not independent, 
in general, and are thus described by joint probability density functions. We will 
return to the subject of the interdependence of random variables after first 
presenting some of the general ways of characterising joint distributions. We 
will concentrate mainly on bivariate distributions, i.e. distributions of only two 
random variables, though the results may be extended readily to multivariate 
distributions. The subject of multivariate distributions is large and a detailed 
study is beyond the scope of this book; the interested reader should therefore 
consult one of the many specialised texts. Flowever, we do discuss the multinomial 
and multivariate Gaussian distributions, in section 26.15. 

The first thing to note when dealing with bivariate distributions is that the 
distinction between discrete and continuous distributions may not be as clear as 
for the single variable case; the random variables can both be discrete, or both 
continuous, or one discrete and the other continuous. In general, for the random 
variables X and Y, the joint distribution will take an infinite number of values 
unless both X and Y have only a finite number of values. In this chapter we 
will consider only the cases where X and Y are either both discrete or both 
continuous random variables. 
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26.11.1 Discrete bivariate distributions 

In direct analogy with the one-variable (univariate) case, if X is a discrete random 
variable that takes the values {x,} and Y one that takes the values {yj} then the 
probability function of the joint distribution is defined as 

j Pr(X = x u Y = yj) for x = x u y = y Jf 
|o otherwise. 

We may therefore think of / (x, y) as a set of spikes at valid points in the xy-plane, 
whose heights represent the probability of obtaining X = x ; and Y = yj. The 
normalisation of f(x,y) implies 

EE f(x i ,y J ) = l, (26.125) 

i j 

where the sums over i and j take all valid pairs of values. We can also define the 
cumulative probability function 

F(x, y) = EE (26.126) 

xi<x yj<y 

from which it follows that the probability that X lies in the range [a l ,a 2 ] and Y 
lies in the range [bi,b 2 ] is given by 

Pr (ai<X<a 2 , bi < Y < b 2 ) = F{a 2 ,b 2 ) — F(a u b 2 ) — F(a 2 ,bi) + F(a u bi). 

Finally, we define X and Y to be independent if we can write their joint distribution 
in the form 


f(x,y)=fx(x)f Y (y), 

i.e. as the product of two univariate distributions. 


(26.127) 


26.11.2 Continuous bivariate distributions 

In the case where both X and Y are continuous random variables, the PDF of 
the joint distribution is defined by 


f{x,y) dx dy = Pr(x < X < x + dx, y < Y < y + dy), 

(26.128) 


so f(x,y)dxdy is the probability that x lies in the range [x,x + dx] and y lies in 
the range [y,y + dy]. It is clear that the two-dimensional function f(x,y) must be 
everywhere non-negative and that normalisation requires 



f(x,y)dxdy = 1. 
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It follows further that 


Pr(«i < X < a 2 , b\ < Y <b 2 ) = 


j-bi 

Jbi 


f (x, y) dx dy. 


We can also define the cumulative probability function by 


F(x,y) = Pr{X < x, Y < y) = / 

J — 00 

from which we see that (as for the discrete case). 


f(u, v)dudv. 


(26.129) 


Pr(ai < X < a 2 , b\ < Y < b 2 ) = F{a 2 ,b 2 ) — F(ai,b 2 ) — F(a 2 ,b 1 ) + F(aubi). 

Finally we note that the definition of independence (26.127) for discrete bivariate 
distributions also applies to continuous bivariate distributions. 


►A flat table is ruled with parallel straight lines a distance D apart, and a thin needle of 
length 1 < D is tossed onto the table at random. What is the probability that the needle 
will cross a line? 


Let 9 be the angle that the needle makes with the lines, and let x be the distance from 
the centre of the needle to the nearest line. Since the needle is tossed ‘at random' onto 
the table, the angle 8 is uniformly distributed in the interval [0, 7t], and the distance x 
is uniformly distributed in the interval [O.D/2]. Assuming that 8 and x are independent, 
their joint distribution is just the product of their individual distributions, and is given by 


f(8,x) 


1 1 
nlf/2 


2 

nD 


The needle will cross a line if the distance x of its centre from that line is less than ( l sin 8. 
Thus the required probability is 


2 

7 iD 


2 1 

dx d8 = — - 
nD 2 


21 

sin 8 d8 = — . 

nD 


This gives an experimental (but cumbersome) method of determining n. ◄ 


26.11.3 Marginal and conditional distributions 

Given a bivariate distribution f(x,y), we may only be interested in the proba- 
bility function for X irrespective of the value of Y (or vice versa). This marginal 
distribution of X is obtained by summing or integrating, as appropriate, the 
joint probability distribution over all allowed values of Y. Thus, the marginal 
distribution of X (for example) is given by 

{ y^,- f(x, Vi) for a discrete distribution, 

^ jJK y ” (26.130) 

f f(x,y)dy for a continuous distribution. 

It is clear that an analogous definition exists for the marginal distribution of Y. 
Alternatively, one might be interested in the probability function of X given 
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that Y takes some specific value ofY= yo, i.e. Pr(X = x\Y = yo). This conditional 
distribution of X is given by 


g(x) = 


f{x,yp) 
fr(yo) ’ 


where /y(y) is the marginal distribution of Y . The division by /y(yo) is necessary 
in order that g(.x) is properly normalised. 


26.12 Properties of joint distributions 

The probability density function f(x,y ) contains all the information on the joint 
probability distribution of two random variables X and 7. In a similar manner 
to that presented for univariate distributions, however, it is conventional to 
characterise f(x,y) by certain of its properties, which we now discuss. Once 
again, most of these properties are based on the concept of expectation values, 
which are defined for joint distributions in an analogous way to those for single- 
variable distributions (26.46). Thus, the expectation value of any function g(X, 7) 
of the random variables X and 7 is given by 


E[g(X,Y)] 


E j g(xuyj)f(xuyj) for the discrete case, 

S-rjj f-o o f)/ ( x > >’) dx dy for the continuous case. 


26.12.1 Means 

The means of X and 7 are defined respectively as the expectation values of the 
variables X and 7 . Thus, the mean of X is given by 


E[X]=n x 


E; E j x if(xi,yj) for the discrete case, 

IZoIZ 0 xf(x,y)dxdy for the continuous case. (26 131) 


E[Y] is obtained in a similar manner. 


► S/jow that if X and Y are independent random variables then E[XY] = E[X]E[Y], 


Let us consider the case where X and 7 are continuous random variables. Since X and 
7 are independent f(x,y) = fx{x)f Y (y), so that 

/ CO /» 00 /»00 /»00 

/ xyfx(x)f Y (y)dxdy = / xf x (x)dx / yf Y (y)dy = E[X]E[Y], 

■00 J —oo J —00 J —oo 

An analogous proof exists for the discrete case. ◄ 
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26.12.2 Variances 

The definitions of the variances of X and Y are analogous to those for the 
single-variable case (26.48), i.e. the variance of X is given by 


V \X] = a 2 = J ^ ^i iX ‘ ~ Pxffte’yj) 

X I fZo IZ( X ~ Vx ) 2 f(x, y ) dx dy 


for the discrete case, 

for the continuous case. ^2) 


Equivalent definitions exist for the variance of Y . 


26.12.3 Covariance and correlation 

Means and variances of joint distributions provide useful information about 
their marginal distributions, but we have not yet given any indication of how to 
measure the relationship between the two random variables. Of course, it may 
be that the two random variables are independent, but often this is not so. For 
example, if we measure the heights and weights of a sample of people we would 
not be surprised to find a tendency for tall people to be heavier than short people 
and vice versa. We will show in this section that two functions, the covariance 
and the correlation , can be defined for a bivariate distribution and that these are 
useful in characterising the relationship between the two random variables. 

The covariance of two random variables X and Y is defined by 

Co ,v[X,Y]~E[(X-n x )(Y -n Y )\, (26.133) 

where p x and /(y are the expectation values of X and Y respectively. Clearly 
related to the covariance is the correlation of the two random variables, defined 
by 


Corrpf, Y] 


Cov[X, Y] 
OxOy 


(26.134) 


where a x and cry are the standard deviations of X and Y respectively. It can be 
shown that the correlation function lies between —1 and +1. If the value assumed 
is negative, X and Y are said to be negatively correlated, if it is positive they are 
said to be positively correlated and if it is zero they are said to be uncorrelated. 
We will now justify the use of these terms. 

One particularly useful consequence of its definition is that the covariance 
of two independent variables, X and Y, is zero. It immediately follows from 
(26.134) that their correlation is also zero, and this justifies the use of the term 
‘uncorrelated’ for two such variables. To show this extremely important property 
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we first note that 

Co v[X,Y] = E[(X - n x )(Y - n Y )\ 

= E[XY — f.i x Y — [i Y X + HxHy] 

= E[XY] - fi X E[Y] - fi Y E[X ] + h x hy 
= E[XY]- nxfi Y - (26.135) 

Now, if X and Y are independent then E[XY] — E[X]E[Y] = h x hy and so 
Cov[2f, Y] = 0. It is important to note that the converse of this result is not 
necessarily true; two variables dependent on each other can still be uncorrelated. 
In other words, it is possible (and not uncommon) for two variables X and Y 
to be described by a joint distribution f(x,y) that cannot be factorised into a 
product of the form g(x)h(y), but for which CorrpT, Y] = 0. Indeed, from the 
definition (26.133), we see that for any joint distribution f(x,y) that is symmetric 
in x about fi x (or similarly in y) we have CorrpT, Y] = 0. 

We have already asserted that if the correlation of two random variables is 
positive (negative) they are said to be positively (negatively) correlated. We have 
also stated that the correlation lies between —1 and +1. The terminology suggests 
that if the two RVs are identical (i.e. X = Y ) then they are completely correlated 
and that their correlation should be +1. Likewise, if X — — Y then the functions 
are completely anticorrelated and their correlation should be — 1. Values of the 
correlation function between these extremes show the existence of some degree 
of correlation. In fact it is not necessary that X = Y for Corr[X, 7] = 1; it is 
sufficient that 7 is a linear function of X, i.e. 7 = aX + b (with a positive). If a 
is negative then CorrpT, 7] = — 1. To show this we first note that jiy = af.i x + b. 
Now 


7 = aX + b = aX + /iy — a[i x => 7 — /(y = a(X — fi x ), 

and so using the definition of the covariance (26.133) 

Cov[X, 7] = aE[(X — f.i x ) 2 ] = acj x . 

It follows from the properties of the variance (subsection 26.5.3) that cry = \a\a x 
and so, using the definition (26.134) of the correlation, 


Corr[V, 7] = 


aa\ 


which is the stated result. 

It should be noted that, even if the possibilities of X and 7 being non-zero are 
mutually exclusive, Corr[2T, 7] need not have value +1. 
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►A biased die gives probabilities \ p , p. p, p, p, 2 p of throwing 1, 2, 3, 4, 5, 6 respectively. 
If the random variable X is the number shown on the die and the random variable Y is 
defined as X 1 , calculate the covariance and correlation of X and Y . 


We have already calculated in subsections 26.2.1 and 26.5.4 that p = k, E[X] = ||, 
E [X 2 ] = and V[X] = f|. Using (26.135) 

Cov[Y, Y] = Cov[Y, Y 2 ] = £[Y 3 ] - E[X]E[X 2 ]. 

Now E [Y 3 ] is given by 

E [Y 3 ] = l 3 x ip + (2 3 + 3 3 + 4 3 + 5 3 )p + 6 3 x 2 p 


and the covariance of X and Y is given by 


53 253 

Cov[Y,Y] = 101--x- 


3660 

169"' 


The correlation is defined by Corr[Y, Y] = Cov[Y, Y]/o x oy. The standard deviation of 
Y may be calculated from the definition of the variance. Letting p Y = E[X 2 ] = gives 


ffy — ^(l 2 — /( y ) + p (2~ — p Y ) +p(3 2 — p Y ) +p(4 2 — Py) 

+ P (5 2 — Py) 2 + 2p (6 2 -p Y ) 2 
_ 187 356 _ 28 824 
“ 169 P ~ 169 ’ 


We deduce that 


Corr[Y, Y] 


3660 / 169 [ 169 
lfi9\ 28 824 V 480 


0.984. 


Thus the random variables Y and Y display a strong degree of positive correlation, as we 
would expect. ◄ 


We note that the covariance of X and Y occurs in various expressions. For 
example, if X and Y are not independent then 


V[X + Y] = E [(X + Y) 2 ] ~(E[X + Y]) 2 

= E[ X 2 ] + 2E[XY] + E [Y 2 ] ~{(E[X]) 2 + 2E[X]E[Y]+(E[Y]) 2 } 
= V[X] + V[Y] + 2(E[XY] - E[X]E[Y]) 

= V[X] + V[Y] + 2 Cov[Y, Y], 


More generally, we find (for a, b and c constant) 

V[aX + bY +c] = a 2 V[ X] + b 2 V[Y] + 2 ab Cov[Y, Y], 


(26.136) 
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Note that if X and Y are in fact independent then Cov[A, Y] — 0 and we recover 
the expression (26.68) in subsection 26.6.4. 

We may use (26.136) to obtain an approximate expression for V[f(X,Y)] 
for any arbitrary function /, even when the random variables X and Y are 
correlated. Approximating f(X,Y) by the linear terms of its Taylor expansion 
about the point (px,Py), we have 


f(X,Y)*f(n x ,n Y ) 



(X — Hx) 



(Y-Hy), 


(26.137) 


where the partial derivatives are evaluated at X = fix and Y = /iy. Taking the 
variance of both sides, and using (26.136), we find 


V[f(X,Y)] 



V[X] + 



F[n+2 (lv) 



Cov[X, Y], 
(26.138) 


Clearly, if Cov[A, Y ] — 0, we recover the result (26.69) derived in subsection 26.6.4. 
We note that (26.138) is exact if f(X, Y ) is linear in X and Y . 

For several variables X t , i = 1,2,..., n, we can define the symmetric (positive 
definite) covariance matrix whose elements are 


Vtj = Co y[X u Xj], 


(26.139) 


and the symmetric (positive definite) correlation matrix 


pij = C'orr[A„ A,]. 


The diagonal elements of the covariance matrix are the variances of the variables, 
whilst those of the correlation matrix are unity. For several variables, (26.138) 
generalises to 


vmx u x 2 ,...,x,)) . £ (W) 2 V[Xi] + EE {§£) (10 CovK.X,], 

where the partial derivatives are evaluated at X t = px r 
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►A card is drawn at random from a normal 52-card pack and its identity noted. The card 
is replaced, the pack shuffled and the process repeated. Random variables W , X, Y , Z are 
defined as follows: 

W = 2 if the drawn card is a heart; W = 0 otherwise. 

X = 4 if the drawn card is an ace, king, or queen ; X = 2 if the card is 

a jack or ten; X = 0 otherwise. 

Y = 1 if the drawn card is red; Y = 0 otherwise. 

Z = 2 if the drawn card is black and an ace, king or queen ; Z = 0 

otherwise. 

Establish the correlation matrix for W, X, Y, Z. 


The means of the variables are given by 

p w =2x\ = \, p x = (4 x -|) + (2 x = ||, 

pY = 1 X j = j, pz = 2 X = jj. 

The variances, calculated from o\, = V[U] = E [ U 2 ] — (E[U]) 2 , where U = W, X, Y or 
Z, are 

^ = (4x^)-(i) 2 = l 4 = (16x^) + (4xA)_ ( ||) 2 = 4g ; 


4 = (ix‘)-G)M, 4 = ( 4 x !)-(^ 2 = 


69 
169 ‘ 


The covariances are found by first calculating £[ W X] etc. and then forming E[WX]—p W Rx 
etc. 


E[WX] =2(4) 

(1 

)+2(2)(|) = ^ 

Cov[W,X] 

_ 8 1/16 

13 2 V 13 

)=o. 

E[WY] = 2(1) 

(i) 

_ 1 

2’ 

Cov[fF, Y] 


_ 1 

4’ 

E[WZ] = 0, 



Cov[JL,Z] 

= o-H&) 

3 

26 : 

E[XY] = 4(1) ( 

1) 

+ 2(D(£) = £. 

Cov[X, Y] 

_ 8 16 / 1 \ 

13 13 \ 2 / 

= 0, 

E[XZ] = 4(2) ( 

1) 

_ 12 

13’ 

Cov[X,Z] ; 

12 16/3 

13 13 V 13 

\ 108 

) 169 

E[YZ] = 0, 



Cov[Y,Z] 

= °-Hn) 

3 

26‘ 


The correlations Corr[W,X] and Corrpf, Y] are clearly zero; the remainder are given by 
Corr[FF, Y] = ± (f x ±)~ 1/2 = 0.577, 

Corr [W,Z] = -| (| x ^)^ 1/2 = -0.209, 

Corr[X,Z] = ($ X ^)~ 1/2 = 0.598, 


Corr[Y,Z] = 


_3/l v 69 ' 
26 U * 169 > 


- 1/2 


= -0.361. 


Finally, then, we can write down the correlation matrix: 


l 

0 

0.58 

-0.21 

0 

1 

0 

0.60 

0.58 

0 

1 

-0.36 

-0.21 

0.60 

-0.36 

1 
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As would be expected, X is uncorrelated with either W or Y , colour and face-value being 
two independent characteristics. Positive correlations are to be expected between W and 
Y and between X and Z ; both correlations are fairly strong. Moderate anticorrelations 
exist between Z and both W and Y , reflecting the fact that it is impossible for W and Y 
to be positive if Z is positive. ◄ 


Finally, let us suppose that the random variables X h i = 1,2 ,...,«, are related 
to a second set of random variables Y/ c = Yk(X \,X 2 ,...,X n ), k = 1,2 By 
expanding each Yk as a Taylor series as in (26.137) and inserting the resulting 
expressions into the definition of the covariance (26.133), we find that the elements 
of the covariance matrix for the Y/t variables are given by 


Cov[Yfc, Y,] « EE 

i j 




Co y[X h Xj]. 


(26.140) 


It is straightforward to show that this relation is exact if the Y/t are linear 
combinations of the X { . Equation (26.140) can then be written in matrix form as 

V F = SV X S T , (26.141) 


where V F and Mx are the covariance matrices of the Y/t and X,- variables re- 
spectively and S is the rectangular m x n matrix with elements S*,- = dY^/dX 


26.13 Generating functions for joint distributions 


It is straightforward to generalise the discussion of generating function in section 
26.7 to joint distributions. For a multivariate distribution f(X t ,X 2 , . . . , X n ) of 
non-negative integer random variables X,-, i = 1,2,..., n, we define the probability 
generating function to be 

0(ti,t 2 ,...,t„) = £[ff‘rf 2 • • • tf n ]. 


As in the single-variable case, we may also define the closely related moment 
generating function, which has wider applicability since it is not restricted to 
non-negative integer random variables but can be used with any set of discrete 
or continuous random variables X t (i = 1,2,..., n). The MGF of the multivariate 
distribution f(X i,X 2 ,...,X n ) is defined as 

M(t u f 2 , ..., t„) = E [e" Xl e l2X2 ■ ■ ■ e T " x "] = E [e tlXl + t ^+-+ s » x »] 

(26.142) 


and may be used to evaluate (joint) moments of f(Xi,X 2 ,. . . ,X„). By performing 
a derivation analogous to that presented for the single-variable case in subsection 
26.7.2, it can be shown that 


E [X^X^-'-X™"] 


a mi+m2+ - +m ” M ( 0 , o,...,o) 

8t'l n 8tT----8C 


(26.143) 
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Finally we note that, by analogy with the single-variable case, the characteristic 
function and the cumulant generating function of a multivariate distribution are 

defined respectively as 


C(ti,t 2 ,...,t n ) = M{ituit2,...,it n ) 

and K(ti,t 2 ,...,t n ) = lnM(ti, f 2 , . . . , t„). 

► Suppose that the random variables X t , 

i = 1,2,..., n, are described by the PDF 

f(x) = f(x 1 ,X 2 ,. 

. . ,.x„) = N exp(— ' x t Ax), 

where the column vector x = (xq x 2 
is a normalisation constant such that 

x„) J . A is an n x n symmetric matrix and N 

/ /(x)d"x= / / •••/ 

J 00 J —CO J —co J- 

•00 

f(xi,X 2 ,...,x n )dxi dx 2 ■ ■ ■ dx„ = 1 . 

-00 

Find the MGF of /(x). 



From (26.142), the MGF is given by 

M(tu to ,. . ., t„) = N / exp(— 'x t Ax + t T x) d"x, (26.144) 

J 00 

where the column vector t = (fi to ••• f„) T . In order to evaluate this multiple integral, 

we begin by noting that 

x t Ax - 2t T x = (x - A _1 t) T A(x - A-4) - t T A -1 t, 

which is the matrix equivalent of ‘completing the square’. Using this expression in (26.144) 
and making the substitution y = x — A _1 t, we obtain 

M(fi,f 2 ,...,t„) = cexp(|t T A _1 t), (26.145) 

where the constant c is given by 

c = N exp(— iy T Ay)rf"y. 

J 00 

From the normalisation condition for N , we see that c = 1, as indeed it must be in order 
that M(0, 0 ,0) = 1. ◄ 


26.14 Transformation of variables in joint distributions 

Suppose the random variables X h i — 1,2,..., n, are described by the multivariate 
PDF f(x i,x 2 ...,x n ). If we wish to consider random variables Yj , j = 1,2, 
related to the X t by Yj = Yj(Xi,X 2 ,.. -,X m ) then we may calculate g(y\,y 2 ,.. . , 
the PDF for the Yj, in a similar way to that in the univariate case by demanding 
that 

\f(xu x 2 . . . , x„) dx i dx 2 ■ ■ ■ dx n | = \g{y\,yi, ■■■, y m )dy\ dy 2 --- dy m |. 

From the discussion of changing the variables in multiple integrals given in 
chapter 6 it follows that, in the special case where n = m, 

g(yuyi, ■■■, y ,« ) = / (xi,x 2 . . . , x„)\J\, 
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where 




dxi 

8x n 

8{x i,x 2 .. 

-,X„) 

dyi 

dyi 

8{yuyi,.. 

■ ,y n ) 

dx i 

8x n 



dy„ 

dy n 


is the Jacobian of the x t with respect to the }>j. 


► Suppose that the random variables X t , i = 1,2,..., n, are independent and Gaussian dis- 
tributed with means p t and variances of respectively. Find the PDF for the new variables 
Zj = (Xj — Pi)/ Oi> i = 1,2 ,...,n. By considering an elemental spherical shell in Z-space, 
find the PDF of the chi-squared random variable xl = Yl'i=i Z 2 . 


Since the X, are independent random variables, 

f(x 1 ,x 2 ,...,x„) = f(x l )f(x 2 ) • • • f(x„) = 1 exp 

(2n)" / -0i02 ■ ■ ■ o„ 

To derive the PDF for the variables Z„ we require 

\f(xi,X2,...,x„) dx i dx 2 ■ ■ ■ dx„\ = |g(zi,Z2, . . . ,z „) dzi rfz 2 • • • dz „ |, 
and, noting that Jz, = dxi/oj, we obtain 

g(z i* z 2 ,-,z« )= ( 2^ eXP H§ Z? )' 

Let us now consider the random variable xh = Yf'Lt Zf, which we may regard as the 
square of the distance from the origin in the n-dimensional Z-space. We now require that 

g(zi,z 2 ,...,z„)dzi dz 2 ---dz„ = h(xl)dxl- 

If we consider the infinitesimal volume dV = dz l dz 2 - ■ ■ dz n to be that enclosed by the 
n-dimensional spherical shell of radius %„ and thickness then we may write clV = 
Ayff 1 d/ n , for some constant A. We thus obtain 

h(xl)dxl exp(-^x 2 „)xff l dXn °c exp (-^xl)x'ff 2 dxl 

where we have used the fact that dxf = 2x„ dx„- Thus we see that the PDF for y} n is given 
by 

Hll) = Bex p(-hl)x'fT 2 , 

for some constant B. This constant may be determined from the normalisation condition 

/»0O 

/ Hx 2 n)dx 2 n= 1 
JO 

and is found to be B = [2 n,1 T{\n)\^ 1 . This is the nth-order chi-squared distribution 
discussed in subsection 26.9.4. ◄ 


■E 


(Xi - Pi ) 2 


2 of 


26.15 Important joint distributions 

In this section we will examine two important multivariate distributions, the 
multinomial distribution , which is an extension of the binomial distribution, and 
the multivariate Gaussian distribution. 
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26.15.1 The multinomial distribution 

The binomial distribution describes the probability of obtaining x ‘successes’ from 
n independent trials, where each trial has only two possible outcomes. This may 
be generalised to the case where each trial has k possible outcomes with respective 
probabilities p\,p 2 ,..., Pk ■ If we consider the random variables X u i = 1,2,..., n, 
to be the number of outcomes of type i in n trials then we may calculate their 
joint probability function 

f(xi,x 2 ,...,xk) = Pr(Xi = xu X 2 = x 2 , ..., X k = x k ), 

where we must have x > = n - I n n trials the probability of obtaining xi 
outcomes of type 1, followed by x 2 outcomes of type 2 etc. is given by 

PTPi-'-Pk- 

However, the number of distinguishable permutations of this result is 

n! 

xi \x 2 ! • • • x k ! ’ 

and thus 

f(x u x 2 ,...,x k ) = — — ^ ipTpT '■■Pk- (26.146) 

This is the multinomial probability distribution. 

If k = 2 then the multinomial distribution reduces to the familiar binomial 
distribution. Although in this form the binomial distribution appears to be a 
function of two random variables, it must be remembered that, in fact, since 
p 2 — 1 — Pi and x 2 = n — x\, the distribution of X\ is entirely determined by the 
parameters p and n. That X\ has a binomial distribution is shown by remembering 
that it represents the number of objects of a particular type obtained from 
sampling with replacement, which led to the original definition of the binomial 
distribution. In fact, any of the random variables X t has a binomial distribution, 
i.e. the marginal distribution of each X, is binomial with parameters n and p t . It 
immediately follows that 

E[Xj] = npi and V[Xj ] 2 = np,(l — p t ). (26.147) 


►Tt a village fete patrons were invited, for a 10 p entry fee, to pick without looking six 
tickets from a drum containing equal large numbers of red, blue and green tickets. If five 
or more of the tickets were of the same colour a prize of 100 p was awarded. A consolation 
award of 40 p was made if two tickets of each colour were picked. Was a good time had by 
all? 


In this case, all types of outcome (red, blue and green) have the same probabilities. The 
probability of obtaining any given combination of tickets is given by the multinomial 
distribution with n = 6, k = 3 and p< = j, i= 1,2,3. 
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(i) The probability of picking six tickets of the same colour is given by 


Pr (six of the same colour) = 3 x 


6 ! 

6!0!0! 




o 


1 

243' 


The factor of 3 is present because there are three different colours. 

(ii) The probability of picking five tickets of one colour and one ticket of another 
colour is 

Pr(five of one colour; one of another) = 3 x 2 x 



The factors of 3 and 2 are included because there are three ways to choose the 
colour of the five matching tickets, and then two ways to choose the colour of the 
remaining ticket. 

(iii) Finally, the probability of picking two tickets of each colour is 


Pr (two of each colour) 


6 ! 

2 ! 2 ! 2 ! 



10 

8l' 


Thus the expected return to any patron was, in pence, 

100 (2J5 + s) + (®*h) _10 ' 29 ' 

A good time was had by all but the stallholder ! ◄ 


26.15.2 The multivariate Gaussian distribution 


A particularly interesting multivariate distribution is provided by the generalisa- 
tion of the Gaussian distribution to multiple random variables X h i = 1,2, 

If the expectation value of X t is E(Xj) = /<,- then the general form of the PDF is 
given by 


f(x ux 2 ,...,x n ) = IV exp 


4EE aij(Xi - Hi)(Xj - nj) , 

> j J 


where ay = a,-,- and N is a normalisation constant that we give below. If we write 
the column vectors x = (xi x 2 • • • x„) T and n = (/<[ fi 2 • • • /r„) T , and 

denote the matrix with elements ay by A then 

/(x) =/(*!, x 2 ,...,x n ) = N exp [— |(x — /r) T A(x — /<)] , 


where A is symmetric. Using the same method as that used to derive (26.145) it 
is straightforward to show that the MGF of /(x) is given by 


M(t u t 2 , t n ) = exp (/( T t + jt T A _1 t) , 

where the column matrix t = (fi t 2 ■ ■ ■ f„) T . From the MGF, we find that 

d 2 M( 0,0,..., 0) 


E[X,Xj] = 


dtjdtj 


= [liHj + (A )ij. 
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and thus, using (26.135), we obtain 


Co v[X t ,Xj] = E[(Xj - m){Xj - Hj)\ = (A _ 1 )y. 


Hence A is equal to the inverse of the covariance matrix V of the X ; , see (26.139). 
Thus, with the correct normalisation, /(x) is given by 


/(x) = 


1 

(27i) n / 2 (det V) 1 / 2 


exp — ^(x — /r) T V x (x — /r)] . 


(26.148) 


► Evaluate the integral 

I = / exp [— l(x — /d T V _1 (x — pj] 4"x, 

J 00 

where V is a symmetric matrix, and hence verify the normalisation in (26.148). 


We begin by making the substitution y = x — p to obtain 

I=f exp(— Iy T V _1 y)<i"y. 

J 00 

Since V is a symmetric matrix, it may be diagonalised by an orthogonal transformation to 
the new set of variables y' = S T y, where S is the orthogonal matrix with the normalised 
eigenvectors of V as its columns (see section 8.16). In this new basis, the matrix V becomes 

V = S T VS = diag(Ai,A 2 ,...,2„), 

where the f are the eigenvalues of V. Also, since S is orthogonal, det S = +1, and so 

d " y = |det S| d"y' = d"y'. 


Thus we can write / as 
/ = 


/ oo /»oo /»oo / n / 2 \ 

) d * d *-u 

n / ex p dy ‘ = 


(26.149) 


where we have used the standard integral exp(— ay 2 )dy = (n/a) 112 (see subsection 
6.4.2). From section 8.16, however, we note that the product of eigenvalues in (26.149) is 
equal to det V. Thus we finally obtain 

/ = (27r)' ,/2 (det V) 1/2 , 


and hence the normalisation in (26.148) ensures that /(x) integrates to unity. ◄ 


The above example illustrates some importants points concerning the multi- 
variate Gaussian distribution. In particular, we note that the Y( are independent 
Gaussian variables with mean zero and variance A,-. Thus, given a general set of 
n Gaussian variables x with means /i and covariance matrix V, one can always 
perform the above transformation to obtain a new set of variables y', which are 
linear combinations of the old ones and are distributed as independent Gaussians 
with zero mean and variances 

This result is extremely useful in proving many of the properties of the mul- 
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tivariate Gaussian. For example, let us consider the quadratic form (multiplied 
by 2) appearing in the exponent of (26.148) and write it as y} n , i.e. 

ll = (* — ^) T V -1 (x - n). (26.150) 


From (26.149), we see that we may also write it as 

n ,2 


i=l 


which is the sum of n independent Gaussian variables with mean zero and unit 
variance. Thus, as our notation implies, the quantity xl is distributed as a chi- 
squared variable of order n. As illustrated in exercise 26.40, if the variables A, are 
required to satisfy m linear constraints of the form J2'i=i c i^‘ = 0 then xl defined 
in (26.150) is distributed as a chi-squared variable of order n — m. 


26.16 Exercises 

26.1 By shading Venn diagrams, determine which of the following are valid rela- 
tionships between events. For those that are, prove them using de Morgan's 
laws. 

(a) (XUY) = XCiY. 

(b) A U Y = (X U Y ). 

(c) (Xuf)nz = (iuZ)nf. 

(d) iu(i nz) = (xuy)nz. 

(e) XU(Y DZ) = (XUY)UZ. 

26.2 Given that events X, Y and Z satisfy 

(X n y ) u (Z n x) u (Xu?) = (zuT) u {[(zuJf) u(inz)]nyj, 

prove that X 2 Y and either YnZ=0 or Y2Z. 

26.3 A and B each have two unbiased four-faced dice, the four faces being numbered 
1, 2, 3, 4. Without looking, B tries to guess the sum x of the numbers on the 
bottom faces of T’s two dice after they have been thrown onto a table. If the 
guess is correct B receives x 2 euros, but if not he loses x euros. 

Determine B’s expected gain per throw of T’s dice when he adopts each of the 
following strategies: 

(a) he selects x at random in the range 2 < x < 8; 

(b) he throws his own two dice and guesses x to be whatever they indicate; 

(c) he takes your advice and always chooses the same value for .x. Which number 
would you advise? 

26.4 Use the method of induction to prove equation (26.16), the probability addition 
law for the union of n general events. 

26.5 Two duellists, A and B, take alternate shots at each other, and the duel is over 
when a shot (fatal or otherwise!) hits its target. Each shot fired by A has a 
probability a of hitting B, and each shot fired by B has a probability /? of hitting 
A. Calculate the probabilities P l and P 2 , defined as follows, that A will win such 
a duel: Pi, A fires the first shot; P 2 , B fires the first shot. 

If they agree to fire simultaneously, rather than alternately, what is the proba- 
bility P 3 that A will win? Verify that your results satisfy the intuitive inequality 
Pi > P 3 > P 2 . 
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26.6 


26.7 


26.8 


26.9 


26.10 


26.11 


X\,X 2 ,...,X n are independent identically distributed random variables drawn 
from a uniform distribution on [0, 1], The random variables A and B are defined 
by 

A = min(X u X 2 ,...,X n ), B = max(X u X 2 ,...,X„). 

For any fixed k such that 0 < k < ', find the probability p n that both 
A < k and B > 1 — k. 

Check your general formula by considering directly the cases (a) k = 0, (b) k = L 
(c) n = 1 and (d) n = 2. 

A tennis tournament is arranged on a straight knockout basis for 2" players and 
for each round, except the final, opponents for those still in the competition are 
drawn at random. The quality of the field is so even that in any match it is 
equally likely that either player will win. Two of the players have surnames that 
begin with ‘ Q\ Find the probabilities that they play each other 

(a) in the final, 

(b) at some stage in the tournament. 

(a) Gamblers A and B each roll a fair six-faced die, and B wins if his score is 
strictly greater than A's. Show that the odds are 7 to 5 in H’s favour. 

(b) Calculate the probabilities of scoring a total T from two rolls of a fair die 
for T = 2, 3,..., 12. Gamblers C and D each roll a fair die twice and score 
respective totals T c and T D , D winning if T D > T c . Realising that the odds 
are not equal, D insists that C should increase her stake for each game. C 
agrees to stake £1.10 per game, as compared to D’s £1.00 stake. Who will 
show a profit? 

An electronics assembly firm buys its microchips from three different suppliers; 
half of them are bought from firm X, whilst firms Y and Z supply 30% and 
20% respectively. The suppliers use different quality-control procedures and the 
percentages of defective chips are 2%, 4% and 4% for X , Y and Z respectively. 
The probabilities that a defective chip will fail two or more assembly-line tests 
are 40%, 60% and 80% respectively, whilst all defective chips have a 10% chance 
of escaping detection. An assembler finds a chip that fails only one test. What is 
the probability that it came from supplier X ? 

As every student of probability theory will know, Bayesylvania is awash with 
natives, not all of whom can be trusted to tell the truth, and lost and apparently 
somewhat deaf travellers who ask the same question several times in an attempt 
to get directions to the nearest village. 

One such traveller finds himself at a T-junction in an area populated by the 
Asciis and Bisciis in the ratio 11 to 5. As is well known, the Biscii always lie but 
the Ascii tell the truth three quarters of the time, giving independent answers to 
all questions, even to immediately repeated ones. 

(a) The traveller asks one particular native twice whether he should go to the 
left or to the right to reach the local village. Each time he is told ‘left’. Should 
he take this advice, and, if he does, what are his chances of reaching the 
village? 

(b) The traveller then asks the same native the same question a third time and 
for a third time receives the answer ‘left’. What should the traveller do now? 
Have his chances of finding the village been altered by asking the third 
question? 

A boy is selected at random from amongst the children belonging to families with 
n children. It is known that he has at least two sisters. Show that the probability 
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that he has k — 1 brothers is 

(»-!)! 

(2"- 1 -n)(k- l)!(n-fc)!’ 

for 1 < k < n — 2 and zero for other values of k. 

26.12 Villages A, B, C and D are connected by overhead telephone lines joining AB, 
AC, BC, BD and CD. As a result of severe gales, there is a probability p (the 
same for each link) that any particular link is broken. 

(a) Show that the probability that a call can be made from A to B is 

1-2 p 2 + p\ 

(b) Show that the probability that a call can be made from D to A is 

1-2 p 2 - 2 p 3 + 5 p 4 - 2 p 5 . 

26.13 A set of 2N + 1 rods consists of one of each integer length 1,2,... ,2N,2N + 1. 
Three, of lengths a, b and c, are selected, of which a is the longest. By considering 
the possible values of b and c, determine the number of ways in which a non- 
degenerate triangle (i.e. one of non-zero area) can be formed (i) if a is even, 
and (ii) if a is odd. Combine these results appropriately to determine the total 
number of non-degenerate triangles that can be formed with the 2 N + 1 rods, 
and hence show that the probability that such a triangle can be formed from a 
random selection (without replacement) of three rods is 

(N - 1)(4JV + 1) 

2(4!V 2 - 1) ' 

26.14 A certain marksman never misses his target, which consists of a disc of unit 
radius with centre O. The probability that any given shot will hit the target 
within a distance t of 0 is t 2 for 0 < t < 1. The marksman fires n independendent 
shots at the target, and the random variable Y is the radius of the smallest circle 
with centre 0 that encloses all the shots. Determine the PDF for Y and hence 
find the expected area of the circle. 

The shot that is furthest from 0 is now rejected and the corresponding circle 
determined for the remaining n — 1 shots. Show that its expected area is 

n — 1 

7 n - 

n + 1 

26.15 The duration of a telephone call made from a public call-box is a random variable 
T. The probability density function of T is 

{ 0 f <0, 

-2 0<t<l, 

ke~ 2 ‘ t > 1, 


26.16 


where k is a constant. To pay for the call, 20 pence has to be inserted at the 
beginning, and a further 20 pence after each subsequent half-minute. Determine 
by how much the average cost of a call exceeds the cost of a call of average 
length charged at 40 pence per minute. 

Kittens from different litters do not get on with each other and fighting breaks out 
whenever two kittens from different litters are present together. A cage initially 
contains x kittens from one litter and y from another. To quell the fighting, 
kittens are removed at random, one at a time, until peace is restored. Show, by 
induction, that the expected number of kittens finally remaining is 


N(x,y) 



y 

X + 1 
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26.18 


( A more difficult question .) 

If the scores in a cup football match are equal at the end of the normal 
period of play, a ‘penalty shoot-out’ is held in which each side takes up to five 
shots (from the penalty spot) alternately, the shoot-out being stopped if one 
side acquires an unassailable lead (i.e. has a lead greater than its opponents 
have shots remaining). If the scores are still level after the shoot-out a ‘sudden 
death’ competition takes place. In sudden death each side takes one shot and the 
competition is over if one side scores and the other does not ; if both score, or 
both fail to score, a further shot is taken by each side, and so on. Team 1, which 
takes the first penalty, has a probability p i, which is independent of the player 
involved, of scoring and a probability qi (= 1 — pfi of missing; p 2 and q 2 are 
defined likewise. 

Define Pr(i :x,y) as the probability that team i has scored x goals after y 
attempts, and let f(M) be the probability that the shoot-out terminates after a 
total of M shots. 


(a) Prove that the probability that ‘sudden death’ will be needed is 

5 

/(11+) = ^( 5 C r ) 2 (p lP2 ) r (<M 2 ) 5 - r . 

r=0 

(b) Give reasoned arguments (preferably without first looking at the expressions 
involved) which show that 




\'J p 2 Pr(l :r,N)Pv{2:5 — N + r,N — 1) \ 

\ + (j 2 Pr( 1 : 6 — IV + r, lV)Pr(2 : r, AT — 1 ) ] 


for N = 3, 4, 5 and 


f(M = 2N+l) 


j pi Pr(l : 5 — JV + r, IV) Pr(2 :r,N) \ 

2.^ | + qi Pr(l :r,N) Pr(2 :5 — N + r,N) J 


for N = 3,4. 

(c) Give an explicit expression for Pr(; :x,y) and hence show that if the teams 
are so well matched that pi = p 2 = 1/2 then 


/(22V) = Y, 


r = 0 
2N-5 


1 \ N\(N — 1) !6 

2™ ) r \(N — r) !(6 — iV + r) \(2N — 6 — r) ! ’ 


f(2N + l)='Z(w 


i 


(AMI 


n 2 


2 2N J rl(N — r)\(5 — N + r) \(2N — 5 — r) ! 


(d) Evaluate these expressions to show that, expressing /(M) in units of 2 8 , we 
have 

M 6 7 8 9 10 11+ 

f(M) 8 24 42 56 63 63 

Give a simple explanation of why /(10) = /(11+). 

A particle is confined to the one-dimensional space 0 < x < a and classically 
it can be in any small interval dx with equal probability. However, quantum 
mechanics gives the result that the probability distribution is proportional to 
sin 2 (n7ix/a), where n is an integer. Find the variance in the particle’s position 
in both the classical and quantum mechanical pictures and show that, although 
they differ, the latter tends to the former in the limit of large n , in agreement 
with the correspondence principle of physics. 
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26.19 

26.20 


26.21 


26.22 


26.23 

26.24 


A continuous random variable X has a probability density function f(x ) ; the 
corresponding cumulative probability function is F(x). Show that the random 
variable Y = F(X) is uniformly distributed between 0 and 1. 

For a non-negative integer random variable X, in addition to the probability 
generating function ® Y (f) defined in equation (26.71) it is possible to define the 
probability generating function 

00 

n = 0 


where g„ is the probability that X > n. 


(a) Prove that and 'Y x are related by 


'Vx(t) = 


1 - O.y(0 
1 -t 


(b) Show that E[X] is given by *P A -(1) and that the variance of X can be 
expressed as 2 v FJ r (l) + 'f'x(l) — ['Ey)!)] 2 . 

(c) For a particular random variable X, the probability that X > n is equal to 
a" +1 with 0 < a < 1. Use the results in (b) to show that V[X] = a( 1 — a)' 2 . 


(a) In two sets of binomial trials T and t the probabilities that a trial has a 
successful outcome are P and p respectively, with corresponding probabilites 
of failure of Q = 1 — P and q = 1 — p. One ‘game’ consists of a trial T 
followed, if T is successful, by a trial f and then a further trial T. The two 
trials continue to alternate until one of the T trials fails, at which point the 
game ends. The score S for the game is the total number of successes in the 
f-trials. Find the PGF for S and use it to show that 


E[S] = 


Pp 

Q ’ 


F[S] 


Pp(l-Pq) 

Q 2 


(b) Two normal unbiased six-faced dice A and B are rolled alternately starting 
with A; if A shows a 6 the experiment ends. If B shows an odd number no 
points are scored, if it shows a 2 or a 4 then one point is scored, whilst if 
it records a 6 then two points are awarded. Find the average and standard 
deviation of the score for the experiment and show that the latter is the 
greater. 

Use the formula obtained in subsection 26.8.2 for the moment generating function 
of the negative binomial distribution to determine the CGF K n (t) for the number 
of trials needed to record n successes. Evaluate the first four cumulants and use 
them to confirm the stated results for the mean and variance and to show that 
the distribution has skewness and kurtosis given respectively by 


2-P 

1 - p) 


and 


6-6 p + p 2 

s/n{l~P) ' 


A point P is chosen at random on the circle x 2 + y 1 = 1. The random variable 
X denotes the distance of P from (1,0). Find the mean and variance of X and 
the probability that X is greater than its mean. 

As assistant to a celebrated and imperious newspaper proprietor, you are given 
the job of running a lottery in which each of his five million readers will have 
an equal independent chance p of winning a million pounds; you have the job of 
choosing p. However, if nobody wins it will be bad for publicity whilst if more 
than two readers do so, the prize cost will more than offset the profit from extra 
circulation - in either case you will be sacked! Show that, however you choose 
p, there is more than a 40% chance you will soon be clearing your desk. 
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26.25 


26.26 


26.27 

26.28 


26.29 


26.30 


26.31 


The number of errors needing correction on each page of a set of proofs follows 
a Poisson distribution of mean /(. The cost of the first correction on any page is 
a and that of each subsequent correction on the same page is /?. Prove that the 
average cost of correcting a page is 

a + /?(/( — 1) — (a — /?)<?''. 

In the game of Blackball, at each turn Muggins draws a ball at random from a 
bag containing five white balls, three red balls and two black balls; after being 
recorded, the ball is replaced in the bag. A white ball earns him $1 whilst a red 
ball gets him $2 ; in either case he also has the option of leaving with his current 
winnings or of taking a further turn on the same basis. If he draws a black ball 
the game ends and he loses all he may have gained previously. Find an expression 
for Muggins' expected return if he adopts the strategy to drawing up to n balls 
if he has not been eliminated by then. 

Show that, as the entry fee to play is $3, Muggins should be dissuaded from 
playing Blackball, but if that cannot be done what value of n would you advise 
him to adopt? 

Show that for large r the value at the maximum of the PDF for the gamma 
distribution of order r with parameter X is approximately X/yj2n(r — 1). 

A husband and wife decide that their family will be complete when it includes 
two boys and two girls - but that this would then be enough! The probability 
that a new baby will be a girl is p. Ignoring the possibility of identical twins, 
show that the expected size of their family is 



where q = 1 — p. 

The probability distribution for the number of eggs in a clutch is Po(2.), and the 
probability that each egg will hatch is p (independently of the size of the clutch). 
Show by direct calculation that the probability distribution for the number of 
chicks that hatch is Po (Xp) and so justify the assumptions made in the worked 
example at the end of subsection 26.7.1. 

A shopper buys 36 items at random in a supermarket where, because of the sales 
tax imposed, the final digit (the number of pence) in the price is uniformly and 
randomly distributed from 0 to 9. Instead of adding up the bill exactly she rounds 
each item to the nearest 10 pence, rounding up or down with equal probability 
if the price ends in a ‘5’. Should she suspect a mistake if the cashier asks her for 
23 pence more than she estimated? 

Under EU legislation on harmonisation, all kippers are to weigh 0.2000 kg and 
vendors who sell underweight kippers must be fined by their government. The 
weight of a kipper is normally distributed with a mean of 0.2000 kg and a 
standard deviation of 0.0100 kg. They are packed in cartons of 100 and large 
quantities of them are sold. 

Every day a carton is to be selected at random from each vendor and tested 
according to one of the following schemes, which have been approved for the 
purpose. 

(a) The entire carton is weighed and the vendor is fined 2500 euros if the average 
weight of a kipper is less than 0.1975 kg. 

(b) Twenty-five kippers are selected at random from the carton; the vendor is 
fined 100 euros if the average weight of a kipper is less than 0.1980 kg. 

(c) Kippers are removed one at a time, at random, until one has been found 
that weighs more than 0.2000 kg; the vendor is fined n(n — 1) euros, where n 
is the number of kippers removed. 
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Which scheme should the Chancellor of the Exchequer be urging his government 
to adopt? 

26.32 In a certain parliament the government consists of 75 New Socialites and the 
opposition consists of 25 Preservatives. Preservatives never change their mind, al- 
ways voting against government policy without a second thought; New Socialites 
vote randomly, but with probability p that they will vote for their party leader’s 
policies. 

Following a decision by the New Socialites’ leader to drop certain manifesto 
commitments, N of his party decide to vote consistently with the opposition. The 
leader’s advisors reluctantly admit that an election must be called if N is such 
that, at any vote on government policy, the chance of a simple majority in favour 
would be less than 80%. Given that p = 0.8, estimate the lowest value of N that 
would precipitate an election. 

26.33 A practical-class demonstrator sends his 12 students to the storeroom to collect 
apparatus for an experiment, but forgets to tell each which type of component 
to bring. There are three types, A , B and C, held in the stores (in large numbers) 
in the proportions 20%, 30% and 50% respectively, and each student picks a 
component at random. In order to set up one experiment, one unit each of A and 
B and two units of C are needed. Find an expression for the probability Pr(IV) 
that at least N experiments can be set up. 

(a) Evaluate Pr(3). 

(b) Show that Pr(2) can be written in the form 

6 8-i 

Pr(2) = (0.5) 12 ^ 12 C, (OAyJ^^Cj (0.6)-'. 

i=2 j= 2 


(c) By considering the conditions under which no experiments can be set up, 
show that Pr(l) = 0.9145. 

26.34 The random variables X and Y take integer values > 1 such that 2x + y < 2a, 
where a is an integer greater than 1. The joint probability within this region is 
given by 


Pr(Y = x,Y = y) = c(2x + y), 


where c is a constant, and it is zero elsewhere. 
Show that the marginal probability Pr(X = x) is 


Pr(Y 


x) = 


6 (a — x)(2x + 2a + 1) 
a(a — l)(8n + 5) 


and obtain expressions for Pr(T = y), (a) when y is even and (b) when y is odd. 
Show further that 


E[Y] 


6 a 2 + Aa + 1 
8u -t- 5 


(You will need the results about series involving the natural numbers given in 
subsection 4.2.5.) 

26.35 The continuous random variables X and Y have a joint PDF proportional to 
xy(x — y) 2 with 0 < .x < 1 and 0 < y < 1. Find the marginal distributions 
for X and Y and show that they are negatively correlated with correlation 
coefficient — =. 
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26.36 A discrete random variable X takes integer values n = 0, 1,. . . , N with probabil- 
ities p„. A second random variable Y is defined as Y = (X — p) 2 , where p is the 
expectation value of X. Prove that the covariance of X and Y is given by 

N N 

Cov[2f, Y ] = ^ n 3 p„ — 3 p n 2 p„ + 2 p 3 . 

n = 0 n = 0 


Now suppose that X takes all its possible values with equal probability and hence 
demonstrate that two random variables can be uncorrelated even though one is 
defined in terms of the other. 

26.37 Two continuous random variables X and Y have a joint probability distribution 

f{x,y) = A(x 2 + y 2 ), 


26.38 


26.39 


where A is a constant and 0 < x < a, 0 < y < a. Show that X and Y are negatively 
correlated with correlation coefficient —15/73. By sketching a rough contour 
map of f(x,y ) and marking off the regions of positive and negative correlation, 
convince yourself that this (perhaps counter-intuitive) result is plausible. 

A continuous random variable X is uniformly distributed over the interval [ — c, c] . 
A sample of In + 1 values of X is selected at random and the random variable 
Z is defined as the median of that sample. Show that Z is distributed over [— c, c] 
with probability density function 


/,.(-) 


(2n + 1)! 2 

(«!) 2 (2c) 2 »+i 


2-jti 


Find the variance of Z. 

Show that, as the number of trials n becomes large but np, = Xu i = 1, 2, . . . , k — 1, 
remains finite, the multinomial probability distribution (26.146), 


M n (x u x 2 ,...,x k ) = 


xi \x 2 ! • • • x k ! 


XI X' ■) Xb 

Pi Pi ■■■Pk’ 


can be approximated by a multiple Poisson distribution (with k — 1 factors) 


M' n (xu x 2 , . . . , Xk-i) = n p-. 

i= 1 x ‘- 

(Write ' Pi = <5 and express all terms involving subscript k in terms of n and 
5, either exactly or approximately. You will need to use n ! « ;! f [(« — e)!] and 
(1 — a/n)" « e~ a for large n.) 

(a) Verify that the terms of M' when summed over all values of xi, X 2 ,...,x k ~i 
add up to unity. 

(b) If k = 1 and A, = 9 for all i = 1,2,..., 6, estimate, using the appropriate 
Gaussian approximation, the chance that at least three of xi, x 2 ,...,X(, will 
be 15 or greater. 

26.40 The variables X t , i = 1,2,..., n, are distributed as a multivariate Gaussian, with 
means p t and a covariance matrix V. If the X, are required to satisfy the linear 
constraint Y"=t c,X t = 0, where the c,- are constants (and not all equal to zero), 
show that the variable 

X 2 „ = (x-p) T \/-\x-p) 

follows a chi-squared distribution of order n — 1. 
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26.17 Hints and answers 

26.1 (a) Yes, (b) no, (c) no, (d) no, (e) yes. 

26.2 Reduce the equality to Y n (Y U Z) = Y . 

26.3 Show that if p x / 16 is the probability that the total will be x then the corrsponding 
gain is [p x (x 2 + x) — 16x]/16. (a) A loss of 2.5 euros; (b) a gain of p euros; (c) 
a gain of 2.5 euros, provided he takes your advice and guesses ‘5’ each time. 

26.4 Let B be the union of events Ai,A 2 ,... ,A n and apply (26.9) with A as A n+ i. 
Evaluate Pr(B fl4„ + [) by applying the assumed result to the set of n events C, = 
AiDA n+ i for i = 1,2,. . ., n and noting that C,nC 7 n- • -nC m = AiCiAjCi • • -rvl m rr4„ + i. 

26.5 Pi =a(a + ^-a J Sr 1 ;P 2 = a(l - /?)(oc + /? - a)?)- 1 ; P 3 = a(oc + Pr 1 - 

26.6 Find simple expressions for the separate probabilities that A > k and B < 1 — k, 

and also for the two conditions at the same time. Then applying (26.11) and 
identities typified by Pr(C) = Pr(C andD) + Pr(C and£>), show that p n = 1—2(1 — 
k) n + (1 - 2k)". (a) 0, (b) 1 - (c) 0, (d) 2k 2 . 

26.7 If p r is the probability that before the rth round both players are still in the 
tournament (and have not met each other), show that 

J 2>1+1 —r 2 / 1 \ 1 2" +1 ~ r 1 

= 4 2«+i-r-i P f and hence that p r = (^- J 2 » - 1 ’ 

(a) The probability that they meet in the final is p n = 2 -( "- 1) (2" — l) -1 . 

(b) The probability that they meet at some stage in the tournament is given by 
the sum £"=i p,.(2' ,+1 - r - 1 p 1 = 2^ n ~ l) . 

26.8 (b) Pr (T d > T c ) = 0.5{ 1 — [146/(36) 2 ]} = 0.4437; C’s expected return is equal to 
£2.10(1 - 0.4437) « £1.17 for a £1.10 stake. 

26.9 The relative probabilities are X : Y : Z = 50 : 36 : 8 (in units of 10~ 4 ); 

26.10 (a) Show that the probability that an Ascii gives the same answer twice in 

succession to the same question is 5/8 and that if he gives the same answer 
twice the probability that he is telling the truth is 9/10. Conclude that the 
probability that the native questioned is an Ascii is 55/95 and that the 
probability that the traveller is being correctly directed is 99/190. As this is 
more than ], he should go left. 

(b) For the same answer given three times the corresponding fractions are 28/64, 
27/28 and 308/628. The chance that the traveller is being told the truth has 
dropped to 297/628, and, as this is less than one half, he should go ‘right’ 
with a 331/628 chance of success. This is a (very) slight improvement on his 
previous situation. 

26.11 Take Aj as the event that a family consists of j boys and n — j girls, and B as 
the event that the boy has at least two sisters. Apply Bayes’ theorem. 

26.12 If q = 1 — p, the probability is q 3 + 3 pq 2 + p 2 q, the separate terms corresponding 
to zero, one and a particular set of double breaks; (b) similarly, the probability 
is q 5 + 5 q A p + (10 — 2)p 2 q 3 + 2p i q 2 . 

26.13 (i) For a even, the number of ways is 1 + 3 + 5 + • • ■ + (a — 3), and (ii) for a odd 
it is 2 + 4 + 6 + ■ ■ • + (a — 3). Combine the results for a = 2m and a = 2m + 1, 
with m running from 2 to N , to show that the total number of non-degenerate 
triangles is given by N(4N + 1 )(2V — l)/6. The number of possible selections of a 
set of three rods is (2 N + 1 )(2N)(2N — l)/6. 

26.14 The CPF for Y is y 2 " and the PDF is the derivative of this, namely 2 ny 2n ~ l . 
This leads to an expected area equal to nn/(n+ 1). The same PDF gives the 
distribution of the rejected shot and, for a given y, the remaining n— 1 shots, 
all lying within y of O, have a CPF of (z 2 /y 2 ) n ~ 1 . Show, from the corresponding 
PDF, that the expected area is then (n — l)ny 2 /n and that when this is averaged 
over y the stated result is obtained. 
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26.15 


26.16 

26.17 


26.18 

26.19 

26.20 


26.21 


26.22 

26.23 

26.24 

26.25 

26.26 


26.27 

26.28 


26.29 

26.30 


Show that k = e 2 and that the average duration of a call is 1 minute. Let p n 
be the probability that the call ends during the interval 0.5 (n — 1) < t < 0.5n 
and c„ = 20f? be the corresponding cost. Prove that Pi = Pi = \ and that 
p„ = \e 2 (e — i)e~ n for n > 3. It follows that the average cost is 


£ [C] = ^ + 20^1 


I] ne "■ 

n = 3 


The arithmetico-geometric series has sum (3e _1 — 2e~ 2 )/(e — l) 2 and the total 
charge is 5(e + l)/(e — 1) = 10.82 pence more than the 40 pence a uniform rate 
would cost. 

Establish that N(x,y + 1) = [(y + l)lV(x,y) + xN(x — 1 ,y + 1 )\/{x + y + 1). 

(a) The scores must be equal, at r each, after five attempts each. 

(b) M can only be even if team 2 gets too far ahead (or drops too far behind) 
to be caught (or catch up), with conditional probability p 2 (or qi). Conversely M 
can only be odd as a result of a final action by team 1. 

(c) Pr(i :x,y) = y C x p?q y ~ x . 

(d) if the match is still alive at the tenth kick, team 2 is just as likely to lose it as 
to take it into sudden death. 

a 2 / 12; a 2 / 12 — a 2 /(2n 2 n 2 ). 

Show that dY /dX = f and use g(y) = f(x)\dx/dy\. 

(a) Note that g„ - g„ ; _i = -/„ and that g 0 = 1 /o- 

(b) Show that 0' Y (1) = 1 P_y( 1) and relate ® Y (1) to V P Y (1). 

(c) y ¥ x (t) = <x/(l-ctt). 

(a) Use result (26.84) to show that the PGF for S is Q/( 1 — Pq — Ppt). Then use 
equations (26.74) and (26.76). 

(b) The PGF for the score is 6/(21 — lOt — 5f 2 ) and the average score is 10/3. 
The variance is 145/9 and the standard deviation is 4.01. 

K n (t) = n In p + nt + n r _1 ( 1 — p) r e tr . This gives the first four cumulants as 
n/p , n(l — p)/p 2 , n( 1 — p)( 2 — p)/p 3 and n( 1 — p)( 6 — 6 p + p 2 )/p A - 
Mean = 4/7t. Variance = 2 — (I6/71 2 ). Probability that X exceeds its mean 
= l-(2/7t)sin _1 (2/7t) = 0.561. 

Write x = 5 x 10 6 p. Show that (x + \x 2 )e~ x has a maximum value of 0.587 
whatever the value of x, and hence of p. 

Consider 0, 1 and > 2 errors on a page separately. 

Show that the expected return is 



£"c r 


r=0 




This is maximal when n = (ln5/4) _1 = 4.48; n = 4 and n = 5 both give an 
expected return of $2.2528, i.e. less than the entry fee, but are the best that can 
be advised. 

Show that the maximum occurs at x = (r — l)/2 and then use Stirling's approxi- 
mation to find the maximum value. 

Show that the probability that the ‘trials’ end with the nth child, ( n > 4), is given 
by ( n - 1 Cipq n ~ 2 )p + ("~ 1 C l qp"- 2 )q. The expectation value for n is then given by 
the sum n ( n ~ 1 )(P 2< ?" 2 + C TP" 1 )- By twice differentiating the result for the 
sum of a geometric series, prove that J2™='> n ( n ~ 1 )r" 2 = 2/(1 — r) 3 . Use this 
result, after explicitly removing the first two terms, to show that E [n] is as given. 
Pr(k chicks hatching) = J2™=k P°(”>^) Bin (n,p). 

Show that the variance of the distribution that has probabilities of 1 /20 for i = —5 
and i = 5, and probabilities of 1/10 for i = —4,— 3,. ..,4 is 17/2. Conclude that 
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26.31 


26.32 


26.33 

26.34 


26.35 

26.36 

26.37 

26.38 


26.39 

26.40 


23 pence is only 1.3 x the standard deviation expected for the total bill and that 
a bigger discrepancy would occur about 20% of the time. 

There is not much to choose between the schemes. In (a) the critical value of 
the standard variable is —2.5 and the average fine would be 15.5 euros. For 
(b) the corresponding figures are —1.0 and 15.9 euros. Scheme (c) is governed 
by a geometric distribution with p = q = and leads to an expected fine 
of Yin Li n ( n ~ 1 )( 4 )”• The sum can be evaluated by differentiating the result 
Y2™=i p" = p/( 1 — p) with respect to p , and gives the expected fine as 16 euros. 
By making a Gaussian approximation to the binomial distribution, establish that 
N must be such that 

25 - 75 q — Np = 0.841^(75 - N)pq. 


With p = 0.8 and q = 0.2, this has solution N = 9.1. 

(a) [12 !(0.5) 6 (0.3) 3 (0.2) 3 ]/(6 ! 3! 3!) = 0.063. 

Show that Pr(2f = x) = c(a — x)(2x + 2a + 1) and use the fact that YTZ^i Pr(Af = 
x) = 1 to prove that c = 6/[a{a — l)(8n + 5)]. When evaluating Pr(Y = y) consider 
carefully the value of the upper limit in the summation over x. 


(a) Pr(Y =y) 

(b) Pr(Y =y) 


3(2 a — y)(2a + y + 2) 

2 a(a — l)(8a + 5) 

3(2 a — y — l)(2a + y + 1) 
2 a(a — l)(8a + 5) 


Express the expectation value as a summation over m from 1 to a — 1, combining 
the terms involving y = 2m — 1 and y = 2m. 

You will need to establish the normalisation constant for the distribution (36), 
the common mean value (3/5), and the common standard deviation (3/10). The 
marginal distributions are f(x) = 3x(6x 2 — 8x + 3) and the same function of y. 
The covariance has the value —3/50, yielding a correlation of —2/3. 

E[XY]= Y"=o n3 P” - 2 P T,n = o nl Pn + P 3 - 

Set p„ = 1 /(TV + 1) for all n and use the results for series involving the natural 
numbers given in subsection 4.2.5 to show that Cov[X, Y] = 0. 

A = 3/(24 a 4 ); p x = p Y = 5a/8; a\ = a\ = 73n 2 /960; E[XY] = 3a 2 /8; 
Cov|XY] = —a 2 / 64. 

This is the multinomial distribution for n RVs in each of the intervals [— c,z], 
[z + dz,c] and one RV in the interval [z,z + dz]. The corresponding basic 
probabilities are 

[(c + z)/(2c)]" and dz/(2c). Use the fact that /„ and f n+ 1 are normalised to deduce 
the value of f z 2 f n (z)dz. The variance is c 2 /(2n + 3). 

(b) With the continuity correction Pr(x, > 15) = 0.0334. The probability that at 
least three are 15 or greater is 7.5 x 10~ 4 . 

Perform successive transformations of variables y' = S T (x — /() and z, = y'J \fli, 
where the columns of S are eigenvectors of V and the A,- are eigenvalues of V. 
Then both 

X; t = Y"=i z f an d 2; are independent Gaussian variables with mean zero and 
unit variance, which are required to satisfy the linear constraint Y"=i c \ z i = 0 for 
some constants c'. Now require that 

f(z u z 2 ,...,z n ) dzi dz 2 ■■■ dz„ = h(x 2 „) dxl, 


where dz^ dz 2 ■ ■ ■ dz„ is the infinitesimal volume enclosed by the intersection of 
the u-dimensional spherical shell of radius xl an d thickness dxl with the (n — 1)- 
dimensional hyperplane Y1=i c 'i z t = 0 . 
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27 


Statistics 


In this chapter, we turn to the study of statistics, which is concerned with 
the analysis of experimental data. In a book of this nature we cannot hope 
to do justice to such a large subject; indeed, many would argue that statistics 
belongs to the realm of experimental science rather than in a mathematics 
textbook. Nevertheless, physical scientists and engineers are regularly called upon 
to perform a statistical analysis of their data and to present their results in a 
statistical context. Therefore, we will concentrate on this aspect of a much more 
extensive subject. f 


27.1 Experiments, samples and populations 

We may regard the product of any experiment as a set of N measurements of some 
quantity x or set of quantities x,y,...,z. This set of measurements constitutes the 
data. Each measurement (or data item) consists accordingly of a single number x,- 
or a set of numbers (x,-, y h . . . , , z,), where i = l,...,,iV. For the moment, we will 
assume that each data item is a single number, although our discussion can be 
extended to the more general case. 

As a result of inaccuracies in the measurement process, or because of intrinsic 
variability in the quantity x being measured, one would expect the N measured 
values xi, X2, . . . , xjv to be different each time the experiment is performed. We may 
therefore consider the x,- as a set of N random variables. In the most general case, 


I There are, in fact, two separate schools of thought concerning statistics: the frequentist approach 
and the Bayesian approach. Indeed, which of these approaches is the more fundamental is still a 
matter of heated debate. Here we shall concentrate primarily on the more traditional frequentist 
approach (despite the preference of some of the authors for the Bayesian viewpoint!). For a fuller 
discussion of the frequentist approach one could refer to, for example, Stuart & Ord, Kendall's 
Advanced Theory of Statistics Vol. I (Edward Arnold) or Kenney & Keeping, Mathematics of 
Statistics (Van Nostrand). For a discussion of the Bayesian approach one might consult, for 
example, Sivia, Data Analysis: A Bayesian Tutorial (OUP). 
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these random variables will be described by some iV-dimensional joint probability 
density function P(xi,X 2 , ...,xjv)-t In other words, an experiment consisting of N 
measurements is considered as a single random sample from the joint distribution 
(or population) P(x), where x denotes a point in the iV-dimensional data space 
having coordinates (xi, X 2 , ..., xjv). 

The situation is simplified considerably if the sample values x, are independent. 
In this case, the IV-dimensional joint distribution P(x) factorises into the product 
of N one-dimensional distributions, 

P{x) = P( Xl )P(x 2 )---P(x N ). (27.1) 

In the general case, each of the one-dimensional distributions P(x ; ) may be 
different. A typical example of this occurs when N independent measurements 
are made of some quantity x but the accuracy of the measuring procedure varies 
between measurements. 

It is often the case, however, that each sample value x,- is drawn independently 
from the same population. In this case, P(x) is of the form (27.1), but, in addition, 
P(xj) has the same form for each value of i. The measurements xi,X 2 ,...,xn 
are then said to form a random sample of size N from the one-dimensional 
population P{x). This is the most common situation met in practice and, unless 
stated otherwise, we will assume from now on that this is the case. 

27.2 Sample statistics 

Suppose we have a set of N measurements xi,x 2i ...,XN. Any function of these 
measurements (that contains no unknown parameters) is called a sample statistic, 
or often simply a statistic. Sample statistics provide a means of characterising the 
data. Although the resulting characterisation is inevitably incomplete, it is useful 
to be able to describe a set of data in terms of a few pertinent numbers. We now 
discuss the most commonly used sample statistics. 


27.2.1 Averages 

The simplest number used to characterise a sample is the mean, which for N 
values Xj, i= 1,2 ,...,N, is defined by 

1 N 

x = — ^ x,-. (27.2) 

i= 1 


1 In this chapter, we will adopt the common convention that P(x) denotes the particular probability 
density function that applies to its argument, x. This obviates the need to use a different letter 
for the PDF of each new variable. For example, if X and Y are random variables with different 
PDFs, then properly one should denote these distributions by f(x ) and g (y), say. In our shorthand 
notation, these PDFs are denoted by P(x) and P(y), where it is understood that the functional 
form of the PDF may be different in each case. 
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188.7 

204.7 

193.2 

169.0 

168.1 

189.8 

166.3 

200.0 


Table 27.1 Experimental data giving eight measurements of the round trip 
time in milliseconds for a computer ‘packet’ to travel from Cambridge UK to 
Cambridge MA. 


In words, the sample mean is the sum of the sample values divided by the number 
of values in the sample. 


► 7hfe/e 27.1 gives eight values for the round trip time in milliseconds for a computer ‘packet’ 
to travel from Cambridge UK to Cambridge MA. Find the sample mean. 


Using (27.2) the sample mean in milliseconds is given by 

x = ±(188.7 + 204.7 + 193.2 + 169.0 + 168.1 + 189.8 + 166.3 + 200.0) 
1479.8 

= — — = 184.975. 


Since the sample values in table 27.1 are quoted to an accuracy of one decimal place, it is 
usual to quote the mean to the same accuracy, i.e. as x = 185.0. ◄ 


Strictly speaking the mean given by (27.2) is the arithmetic mean and this is by 
far the most common definition used for a mean. Other definitions of the mean 
are possible, though less common, and include 


(i) the geometric mean , 


x„ = 



(ii) the harmonic mean, 


Xh 


N 

Elt i/-V 


(iii) the root mean square. 


X'rms 



(27.3) 


(27.4) 


(27.5) 


It should be noted that, x, Xh and x rms would remain well defined even if some 
sample values were negative, but the value of x g could then become complex. 
The geometric mean should not be used in such cases. 
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► Calculate x g , Xh and x rms for the sample given in table 27.1. 

The geometric mean is given by (27.3) to be 

x g = (188.7 x 204.7 x • • • x 200.0) 1/8 = 184.4. 
The harmonic mean is given by (27.4) to be 


(1/188.7) + (1/204.7) + •••+ (1/200.0) 

Finally, the root mean square is given by (27.5) to be 

x,ms = [|(188.7 2 + 204.7 2 + • ■ ■ + 200.0 2 )] 1/2 = 185.5. ◄ 

Two other measures of the ‘average’ of a sample are its mode and median. The 
mode is simply the most commonly occurring value in the sample. A sample may 
possess several modes, however, and it can thus be misleading in such cases to 
use the mode as a measure of the average of the sample. The median of a sample 
is the halfway point when the sample values x; ( i = l,2,...,iV) are arranged in 
ascending (or descending) order. Clearly, this depends on whether the size of 
the sample, N, is odd or even. If N is odd then the median is simply equal to 
x (N+i)/ 2 > whereas if N is even the median of the sample is usually taken to be 
l{*N/2 + X(N/ 2)+l)- 

► Find the mode and median of the sample given in table 27.1. 

From the table we see that each sample value occurs exactly once, and so any value may 
be called the mode of the sample. 

To find the sample median, we first arrange the sample values in ascending order and 
obtain 

166.3, 168.1, 169.0, 188.7, 189.8, 193.2, 200.0, 204.7. 

Since the number of sample values N = 8, which is even, the median of the sample is 
I(x 4 + x 5 ) = |(188.7 + 189.8) = 189.25. ◄ 


27.2.2 Variance and standard deviation 


The variance and standard deviation both give a measure of the spread of values 
in a sample about the sample mean x. The sample variance is defined by 

1 N 

s 2 = — 5 > - *) 2 ’ ( 27 - 6 ) 

V ;=i 

and the sample standard deviation is the positive square root of the sample 
variance, i.e. 


s 


\ 


i N 

-5>,-x) 2 

;=i 


(27.7) 
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►Find the sample variance and sample standard deviation of the data given in table 27.1. 


We have already found that the sample mean is 185.0 to one decimal place. However, 
when the mean is to be used in the subsequent calculation of the sample variance it is 
better to use the most accurate value available. In this case the exact value is 184.975, and 
so using (27.6), 


s 2 = l [(188.7 — 184.975) 2 + - - - + (200.0 — 184.975) 2 ] 
8 

1608.36 

= — — = 201 . 0 , 


where once again we have quoted the result to one decimal place. The sample standard 
deviation is then given by s = ^/201.0 = 14.2. As it happens, in this case the difference 
between the true mean and the rounded value is very small compared to the variation 
of the individual readings about the mean and using the rounded value makes negligible 
difference; however, this would not be so if the difference were comparable to the sample 
standard deviation. ◄ 


Using the definition (27.7), it is clear that in order to calculate the standard 
deviation of a sample we must first calculate the sample mean. This requirement 
can be avoided, however, by using an alternative form for s 2 . From (27.6), we see 
that 


s 2 = ^ f> ; -. x) 2 

i= 1 

N N 

= -Vx 2 --V 2x,x + 

i= 1 i= 1 

= x 2 — 2x 2 + x 2 = x 2 — x 2 


1 

N 


N 



We may therefore write the sample variance s 2 as 



i=i 



(27.8) 


from which the sample standard deviation is found by taking the positive square 
root. Thus, by evaluating the quantities W =1 x ; and JZHi for our sample, we 
can calculate the sample mean and sample standard deviation at the same time. 


► Calculate TT-i x,- and xf for the data given in table 27.1 and hence find the mean 
and standard deviation of the sample. 


From table 27.1, we obtain 


N 

Xi = 188 - 7 + 204.7 + ■ ■ ■ + 200.0 = 1479.8, 

!=1 

N 


A = (188.7) 2 + (204.7) 2 + • • • + (200.0) 2 = 275 334.36. 

i= 1 
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Since N = 8, we find as before (quoting the final results to one decimal place) 

◄ 


27.2.3 Moments and central moments 

By analogy with our discussion of probability distributions in section 26.5, the 
sample mean and variance may also be described respectively as the first moment 
and second central moment of the sample. In general, for a sample x,-, i = 
1,2 ,...,N, we define the r th moment m r and r th central moment n r as 

1 N 

m r = JjY X ‘’ ( 27 ' 9 ) 

1=1 

1 N 

n r = — Y+x, - mi ) r . (27.10) 

1=1 

Thus the sample mean x and variance s 1 2 may also be written as m i and m 
respectively. As is common practice, we have introduced a notation in which a 
sample statistic is denoted by the Roman letter corresponding to whichever Greek 
letter is used to describe the corresponding population statistic. Thus, we use m r 
and n r to denote the moment and central moment of a sample, since in section 
26.5, we denoted the r th moment and central moment of a population by /(,. and 
v r respectively. 

This notation is particularly useful, since the r th central moment of a sample, 
m r , may be expressed in terms of the r th- and lower-order sample moments n r in 
a way exactly analogous to that derived in subsection 26.5.5 for the corresponding 
population statistics. For example, as discussed in the previous section, the sample 
variance is given by s 2 = x 2 — x 2 but this may also be written as ni = m 2 — m 2 , 
which is to be compared with the corresponding relation V 2 = Hi — fJ.\ derived 
in subsection 26.5.3 for population statistics. This correspondence also holds for 
higher-order central moments of the sample. For example, 

1 N 

m = ^rY^ Xi ~ mi ^ 
v /= 1 

1 N 

= — 3m ix 2 + 3m 2 X; — m]) 

i=t 

= m 3 — 3 min ?2 + 3m 2 mi — m\ 

= m 3 — 3 mim 2 + 2 m\, (27.11) 

which may be compared with equation (26.53) in the previous chapter. 
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Mirroring our discussion of the normalised central moments y r of a population 
in subsection 26.5.5, we may also describe a sample in terms of the dimensionless 
quantities 

_ Hk_ _ 

& “ ** ’ 

g 3 and g 4 are called the sample skewness and kurtosis. Likwise, it is common to 
define the excess kurtosis of a sample by g 4 — 3. 


27.2.4 Covariance and correlation 

So far we have assumed that each data item of the sample consists of a single 
number. Now let us suppose that each item of data consists of a pair of numbers, 
so that the sample is given by (x,-,y,) (i = 1,2, ...,N). 

We may calculate the sample means, x and y, and sample variances, s 2 x and 
s 2 , of the Xi and y,- values individually but these statistics do not provide any 
measure of the relationship between the x* and y,-. By analogy with our discussion 
in subsection 26.12.3 we measure any interdependence between the x* and y,- in 
terms of the sample covariance, which is given by 

1 N 

v xy = ~~ x ^ y ‘ ~ 

1=1 

= (x - x)(y - y) 

= xy — xy. (27.12) 

Writing out the last expression in full, we obtain the form most useful for 
calculations, which reads 

(&)(&)■ 

We may also define the closely related sample correlation by 

__ V xy 

Vxy 7 

SxSy 

which can take values between —1 and +1. If the x,- and y,- are independent then 
Vxy = 0 = r xy , and from (27.12) we see that xy = xy. It should also be noted 
that the value of r xy is not altered by shifts in the origin or by changes in the 
scale of the x,- or y;. In other words, if x' = ax + b and y' = cy + d, where a, 
b, c, d are constants, then = r xy . Figure 27.1 shows scatter plots for several 
two-dimensional random samples x,;, y,- of size N = 1000, each with a different 
value of r xy . 
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Figure 27.1 Scatter plots for two-dimensional data samples of size N = 1000, 
with various values of the correlation r. No scales are plotted, since the value 
of r is unaffected by shifts of origin or changes of scale in x and y. 


► 7e« UK citizens are selected at random and their heights and weights are found to be as 
follows ( to the nearest cm or kg respectively): 

Person ABC DEFGH I J 

Height (cm) 194 16S 177 180 171 190 151 169 175 182 

Weight (kg) 75 53 72 80 75 75 57 67 46 68 

Calculate the sample correlation between the heights and weights. 

In order to find the sample correlation, we begin by calculating the following sums (where 
x, are the heights and y, are the weights) 

^x,- = 1757, J2 y ‘ = 668, 


53^ = 310041, 53 l 2 = 45 746 > = 118029. 

i i i 

The sample consists of N = 10 pairs of numbers, so the means of the x, and of the y, are 
given by x = 175.7 and y = 66.8. Also, xy = 11 802.9. Similarly, the standard deviations 
of the Xj and y, are calculated, using (27.8), as 
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Thus the sample correlation is given by 

_ xy-xy _ 11 802.9 - ( 175 . 7 )( 66 . 8 ) 

^ “ (11.6)(10.6) " ' ' 

Thus there is a moderate positive correlation between the heights and weights of the 
people measured. ◄ 

It is straightforward to generalise the above discussion to data samples of 
arbitrary dimension, the only complication being one of notation. We choose 
to denote the ith data item from an n-dimensional sample as 
where the bracketted superscript runs from 1 to n and labels the elements within 
a given data item whereas the subscript i runs from 1 to N and labels the data 
items within the sample. In this n-dimensional case, we can define the sample 
covariance matrix whose elements are 

Vi ki = x^'x^ — x <k > x(fi 


and the sample correlation matrix with elements 



Both these matrices are clearly symmetric but are not necessarily positive definite. 


27.3 Estimators and sampling distributions 

In general, the population P(x) from which a sample xi,X2 , ...,xjv is drawn 
is unknown. The central aim of statistics is to use the sample values x,- to infer 
certain properties of the unknown population P(x), such as its mean, variance and 
higher moments. To keep our discussion in general terms, let us denote the various 
properties of the population by ai, fl 2 , . . . , or collectively by a. Moreover, we make 
the dependence of the population on the values of these quantities explicit by 
writing the population as P(x|a). For the moment, we are assuming that the 
sample values x,- are independent and drawn from the same (one-dimensional) 
population P(x[a), in which case 

P(x|a) = P(xi|a)P(x 2 |a) - ••P(xjvla). 

Suppose, we wish to estimate the value of one of the quantities a\, ai,..., which 
we will denote simply by a. Since the sample values x,- are our only source of 
information, any estimate of a must be some function of the x,-, i.e. some sample 
statistic. Such a statistic is called an estimator of a and is usually denoted by a(x), 
where x denotes the sample elements xi,X2 , ...,xjv- 

Since an estimator a is a function of the sample values of the random variables 
xi,X2, ■ ■ ■ ,xjv, it too must be a random variable. In other words, if a number of 
random samples, each of the same size N, are taken from the (one-dimensional) 
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population P(x|a) then the value of the estimator a will vary from one sample to 
the next and in general will not be equal to the true value a. This variation of 
the estimator is described by its sampling distribution P(d\a). From section 26.14, 
this is given by 

P(a\a)da = P(x\a)d N x, 

where d N x is the infinitesimal ‘volume’ in x-space lying between the ‘surfaces’ 
a(x) = d and a(x) = ci + da. The form of the sampling distribution generally 
depends upon the estimator under consideration and upon the form of the 
population from which the sample was drawn, including, as indicated, the true 
values of the quantities a. It is also usually dependent on the sample size N. 


>-The sample values Xi,X 2 , . . . ,Xjv are drawn independently from a Gaussian distribution 
with mean p and variance a. Suppose that we choose the sample mean x as our estimator 
p of the population mean. Find the sampling distributions of this estimator. 


The sample mean x is given by 

X = ^(Xl +X 2 3 f Xjv ), 

where the x,- are independent random variables distributed as x, ~ N(p,o 2 ). From our 
discussion of multiple Gaussian distributions on page 1030, we see immediately that x will 
also be Gaussian distributed as N{p,o 2 /N). In other words, the sampling distribution of 
x is given by 


P(x\p,o) 


1 

, = exp 

y/2no 2 /N 


(x ~ P) 1 ' 
2 a 2 /N _ 


(27.13) 


Note that the variance of this distribution is a 2 /N. ◄ 


27.3.1 Consistency, bias and efficiency of estimators 

For any particular quantity a, we may in fact define any number of different 
estimators, each of which will have its own sampling distribution. The quality 
of a given estimator ci may be assessed by investigating certain properties of its 
sampling distribution P(a|a). In particular, an estimator a is usually judged on 
the three criteria of consistency, bias and efficiency, each of which we now discuss. 

Consistency 

An estimator a is consistent if its value tends to the true value a in the large-sample 
limit, i.e. 

lim a = a. 

N—> 00 

Consistency is usually a minimum requirement for a useful estimator. An equiv- 
alent statement of consistency is that in the limit of large N the sampling 
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distribution P(a |a) of the estimator must satisfy 

lim P(d\a) — * S(d — a). 

iV— >oo 


Bias 

The expectation value of an estimator a is given by 

E[a] = J aP(a\a)da = j a(x)P (x\a) d N x, (27.14) 

where the second integral extends over all possible values that can be taken by 
the sample elements xi,x 2 , ...,x N . This expression gives the expected mean value 
of a from an infinite number of samples, each of size N. The bias of an estimator 
d is then defined as 


h(a) = £[«]-«. (27.15) 

We note that the bias b does not depend on the measured sample values 
x\,x 2 , ■ ■ ■ ,xjv- In general, though, it will depend on the sample size N, the func- 
tional form of the estimator a and, as indicated, on the true properties a of 
the population, including the true value of a itself. If b = 0 then a is called an 
unbiased estimator of a. 


►Ta estimator a is biased in such a way that E[a] = a + b{a), where the bias b{a) is given 
by (hi — l)a + b 2 and hi and b 2 are known constants. Construct an unbiased estimator of a. 


Let us first write E [a] is the clearer form 

E [a] = a + (b i — 1 )a + b 2 = hifl + b 2 . 

The task of constructing an unbiased estimator is now trivial, and an appropriate choice 
is a! = (a — b 2 )/b\, which (as required) has the expectation value 

E[d]-b 2 

E [a J = = a. ◄ 

b i 


Efficiency 

The variance of an estimator is given by 

V[d\ = j (a — E[a]) 2 P(a\a) da = [ (a(x) — E [a]) 2 P (x\a) d N x 

' ' (27.16) 

and describes the spread of values a about E[d] that would result from a large 
number of samples, each of size N. An estimator with a smaller variance is said 
to be more efficient than one with a larger variance. As we show in the next 
section, for any given quantity a of the population there exists a theoretical lower 
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limit on the variance of any estimator a. This result is known as Fisher’s inequality 
(or the Cramer Rao inequality) and reads 


V[a ] > 


A SbV / 

^ In P 

( 1 + W E 

da 1 


(27.17) 


where P stands for the population P(x|a) and b is the bias of the estimator. 
Denoting the quantity on the RHS of (27.17) by F m j n , the efficiency e of an 
estimator is defined as 


e = F min /V[a\. 


An estimator for which e = 1 is called a minimum-variance or efficient estimator. 
Otherwise, if e < 1, a is called an inefficient estimator. 

It should be noted that, in general, there is no unique ‘optimal’ estimator a for 
a particular property a. To some extent, there is always a trade-off between bias 
and efficiency. One must often weigh the relative merits of an unbiased, inefficient 
estimator against another that is more efficient but slightly biased. Nevertheless, a 
common choice is the best unbiased estimator (BUE), which is simply the unbiased 
estimator a having the smallest variance V [a] . 

Finally, we note that some qualities of estimators are related. For example, 
suppose a is an unbiased estimator, so that E[a] = a and V[a] -> 0 as N — * 
oo. Using the Bienayme-Chebyshev inequality discussed in subsection 26.5.3, it 
follows immediately that a is also a consistent estimator. Nevertheless, it does not 
follow that a consistent estimator is unbiased. 


► 77?e sample values Xi,X 2 ,...,x^ are drawn independently from a Gaussian distribution 
with mean p and variance a. Show that the sample mean x is a consistent, unbiased, 
minimum-variance estimator of p. 


We found earlier that the sampling distribution of x is given by 


P(x\p,o) 


1 

, = exp 


(x-p) 2 ' 
2a 2 /N _ 


from which we see immediately that E[x] = p and V[x] = a 2 /N. Thus x is an unbiased 
estimator of p. Moreover, since it is also true that V[x] — > 0 as N — > oo, x is a consistent 
estimator of p. 

In order to determine whether x is a minimum-variance estimator of p , we must use 
Fisher's inequality (27.17). Since the sample values x, are independent and drawn from a 
Gaussian of mean p and standard deviation <r, we have 


1 N 

In P(x\p,o) = -~Y^ 


ln(2 


and, on differentiating twice with respect to p, we find 

d 2 In P _ N 

dp 2 a 2 

This is independent of the x,- and so its expectation value is also equal to — N /a 2 . With b 
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set equal to zero in (27.17), Fisher’s inequality thus states that, for any unbiased estimator 
fi of the population mean, 


V\p] > 


N' 


Since V[x] = cr 2 / TV, the sample mean x is a minimum-variance estimator of ft. ◄ 


27.3.2 Fisher’s inequality 


As mentioned above, Fisher’s inequality provides a lower limit on the variance of 
any estimator a of the quantity a; it reads 


V[a] > 



d 2 In P 

da 2 


(27.18) 


where P stands for the population P(x|a) and h is the bias of the estimator. 
We now present a proof of this inequality. Since the derivation is somewhat 
complicated, and many of the details are unimportant, this section can be omitted 
on a first reading. Nevertheless, some aspects of the proof will be useful when 
the efficiency of maximum-likelihood estimators is discussed in section 27.5. 


►Prove Fisher's inequality (27.18). 


The normalisation of P(x|a) is given by 

J P(x\n)d N x = 1, (27.19) 

where d N x = dx]dx2 • • • dx N and the integral extends over all the allowed values of the 
sample items x t . Differentiating (27.19) with respect to the parameter a, we obtain 


IT^-I 


din P 

da 


P d N x = 0. 


(27.20) 


We note that the second integral is simply the expectation value of d In P /da, where the 
average is taken over all possible samples x t , i = 1,2,..., N. Further, by equating two 
expressions for dE[a\/da, obtained by differentiating (27.15) and (27.14) with respect to a 
we obtain, dropping the functional dependencies, a second relationship, 


1 + 


db 

da 


l t ‘Ta d "' = l 


, dlnP 


Pd N x. 


(27.21) 


Now, multiplying (27.20) by a(a), where a(a ) is any function of a, and subtracting the 
result from (27.21), we obtain 


[a — a(o)] 


dlnP 

da 


rj“x - i + ?. 

da 


At this point we must invoke the Schwarz inequality proved in subsection 8.1.3. The proof 
is trivially extended to multiple integrals and shows that for two real functions, g(x) and 
h(x), 

(yj g 2 (x)d N xj h 2 (x)d N xj >(^j g (x)h(x)d N xj . (27.22) 
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If we now let g = [2 — a(a)]^P and h = (3 In P /8a)sjp , we find 


{/ 


[a — oc(a)] 2 P d N x 


}[/ 


/ainpy 

V da ) 


, , 8b 

- {i + y a 


2 


On the LHS, the factor in braces represents the expected spread of a-values around the 
point a(a). The minimum value that this integral may take occurs when a(a) = £[«]. 
Making this substitution, we recognise the integral as the variance V[a\, and so obtain the 
result 


V[a\ > 



(27.23) 


We note that the factor in brackets is the expectation value of (3 In P /da) 2 . 

Fishers inequality is, in fact, often quoted in the form (27.23). We may recover the form 
(27.18) by noting that on differentiating (27.20) with respect to a we obtain 

'3 2 ln P 8 In P 8P N 


1C 


-p + 


3 w x = 0. 


8a 2 ‘ 8a 8a 

Writing 8P /8a as (3 In P/8a)P and rearranging we find that 

3 2 InP 


/( 


/ 3 In P V 
~~8a~ ) 


Pd n x = 


-I 


da 1 


Pd n x. 


Substituting this result in (27.23) gives 


V\ci\'> — ( 1 + 


8b 

da 


/ 


3 2 InP 

8 a 2 


P d x 


Since the factor in brackets is the expectation value of 3 2 In P /8a 2 , we have recovered 
result (27.18). ◄ 


27.3.3 Standard errors on estimators 

For a given sample xi, X 2 , . . . , xn, we may calculate the value of an estimator a(x) 
for the quantity a. It is also necessary, however, to give some measure of the 
statistical uncertainty in this estimate. One way of characterising this uncertainty 
is with the standard deviation of the sampling distribution P(n|a), which is given 
simply by 

^, = (F[u]) 1/2 . (27.24) 

If the estimator u(x) were calculated for a large number of samples, each of size 
N, then the standard deviation of the resulting a values would be given by (27.24). 
Consequently, at, is called the standard error on our estimate. 

In general, however, the standard error o- a depends on the true values of some 
or all of the quantities a and they may be unknown. When this occurs, one must 
substitute estimated values of any unknown quantities into the expression for at, 
in order to obtain an estimated standard error at,. One then quotes the result as 

a = a + at,. 
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► Tefi independent sample values x,-, i = 1,2,..., 10, are drawn at random from a Gaussian 
distribution with standard deviation a = 1. The sample values are as follows ( to two decimal 
places ) : 

2.22 2.56 1.07 0.24 0.18 0.95 0.73 -0.79 2.09 1.81 

Estimate the population mean p. quoting the standard error on your result. 


We have shown in the final worked example of subsection 27.3.1 that, in this case, x is 
a consistent, unbiased, minimum-variance estimator of p and has variance V[x] = a 1 /N. 
Thus, our estimate of the population mean with its associated standard error is 

p = x + — = 1.11 + 0.32. 

si* 

If the true value of a had not been known, we would have needed to use an estimated 
value <7 in the expression for the standard error. Useful basic estimators of a are discussed 
in subsection 27.4.2. ◄ 

It should be noted that the above approach is most meaningful for unbiased 
estimators. In this case, £[a] = a and so a;, describes the spread of a-values about 
the true value a. For a biased estimator, however, the spread about the true value 
a is given by the root mean square error eg, which is defined by 

4 = E[(a-a) 2 ] 

= E[(a — E [a]) 2 ] + (£[n] — a) 2 
= V[a\ + b( a) 2 . 

We see that e? is the sum of the variance of a and the square of the bias and so 
can be interpreted as the sum of squares of statistical and systematic errors. For 
a biased estimator, it is often more appropriate to quote the result as 

a = a + €%. 

As above, it may be necessary to use estimated values a in the expression for the 
root mean square error and thus to quote only an estimate e^oftheerror. 


27.3.4 Confidence limits on estimators 

An alternative (and often equivalent) way of quoting a statistical error is with a 
confidence interval. Let us assume that, other than the quantity of interest a, the 
quantities a have known fixed values. Thus we denote the sampling distribution 
of a by P(a\a). For any particular value of a, one can determine the two values 
ciz(a) and ap(a) such that 

/ S«(a) 

P(a\a)da = oc, (27.25) 

-OO 

/»oo 

Pr(a > a (fa)) = I P(a\a)da = [i. (27.26) 
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P(a\a) 



Figure 27.2 The sampling distribution P(a|a) of some estimator a for a given 
value of a. The shaded regions indicate the two probabilities Pr(a < aja)) = a 
and Pr(a > dp(a)) = /?. 


This is illustrated in figure 27.2. Thus, for any particular value of a, the probability 
that the estimator a lies within the limits a a (a) and ap(a) is given by 

rapW 

Pr(a a (a) < a < ap{a)) = / P(a\a) da = 1 — a — f}. 

J a a (a) 

Now, let us suppose that from our sample xi,X 2 ,...,x^, we actually obtain the 
value Sobs for our estimator. If a is a good estimator of a then we would expect 
a x (a) and ap(a) to be monotonically increasing functions of a (i.e. a„ and ap both 
change in the same sense as a when the latter is varied). Assuming this to be the 
case, we can uniquely define the two numbers a_ and a + by the relationships 

a a (a + ) = S obs and cip(a-) = S 0 b s - 

From (27.25) and (27.26) it follows that 

Pr(a + < a) = a and Pr(a_ > a) = p, 

which when taken together imply 

Pr(a_ <«<a+) = l — a — ft. (27.27) 

Thus, from our estimate S 0 b s , we have determined two values a_ and a + such that 
this interval contains the true value of a with probability 1 — a — /?. It should be 
emphasised that a_ and a + are random variables. If a large number of samples, 
each of size N, were analysed then the interval [a~,a + ] would contain the true 
value a on a fraction 1 — a — [3 of occasions. 

The interval [a_,a+] is called a confidence interval on a at the confidence 
level 1 — a — /?. The values a_ and a + themselves are called respectively the 
lower confidence limit and the upper confidence limit at this confidence level. In 
practice, the confidence level is often quoted as a percentage. A convenient way 
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Figure 27.3 An illustration of how the observed value of the estimator, a 0 bs, 
and the given values a and /? determine the two confidence limits a_ and a + , 
which are such that S a (a + ) = a 0 b s = afia_). 


of presenting our results is 


P(a|n + ) da = a, 
P(a|u_) da = f. 


(27.28) 

(27.29) 


The confidence limits may then be found by solving these equations for a_ and 
a + either analytically or numerically. 

Occasionally one might not combine the results (27.28) and (27.29) but use 
either one or the other to provide a one-sided confidence interval on a. Whenever 
the results are combined to provide a two-sided confidence interval, the interval 
is not specified uniquely by the confidence level 1 — a — f. In other words, there 
are generally an infinite number of intervals [a_,a + ] for which (27.27) holds. 
To specify a unique interval, one often chooses a = f>, resulting in the central 
confidence interval on a. All cases can be covered by calculating the quantities 
c = a — a- and d = a+ — a and quoting the result of an estimate as 


a -i -d 

a = a_ c . 


We have so far assumed that the quantities a other than the quantity of interest 
a are known in advance. If this is not the case then, in principle, the construction 
of confidence limits is considerably more complicated. This is discussed briefly in 
subsection 27.3.6. 


27.3.5 Confidence limits for a Gaussian sampling distribution 


An important special case occurs when the sampling distribution is Gaussian; if 
the mean is a and the standard deviation is a;, then 


P(a\a, (Jfi 


1 



(a — a) 2 


(27.30) 
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For almost any (consistent) estimator a, the sampling distribution will tend to 
this form in the large-sample limit N —* oo, as a consequence of the central limit 
theorem. For a sampling distribution of the form (27.30), the above procedure 
for determining confidence intervals becomes straightforward. Suppose, from our 
sample, we obtain the value n 0 bs for our estimator. In this case, equations (27.28) 
and (27.29) become 

O ( ^ 0bS ~ a+ ) = «, 

where <I>(z) is the cumulative probability function for the standard Gaussian distri- 
bution, discussed in subsection 26.9.1. Solving these equations for a_ and a + gives 

a_ = « obs - - P), (27.31) 

a+ = Sobs + o- a O _1 (l - a); (27.32) 

we have used the fact that <1> -1 (a) = — <t> 1 ( 1 — oe) to make the equations symmetric. 

The value of the inverse function <l> _1 (z) can be read off directly from table 26.3, 

given in subsection 26.9.1. For the normally-used central confidence interval one 
has a = p. In this case, we see that quoting a result using the standard error, as 

a = a + ca, (27.33) 

is equivalent to taking <D -1 (1 — a) = 1. From table 26.3, we find i = 1 — 0.8413 = 
0.1587, and so this corresponds to a confidence level of 1 — 2(0.1587) « 0.683. 
Thus, the standard error limits give the 68.3% central confidence interval. 


► Ten independent sample values x, (i = 1,2,..., 10) are drawn at random from a Gaussian 
distribution with standard deviation a = 1. The sample values are as follows ( to two decimal 
places ) : 

2.22 2.56 1.07 0.24 0.18 0.95 0.73 -0.79 2.09 1.81 

Find the 90% central confidence interval on the population mean p. 


Our estimator p is the sample mean x. As shown towards the end of section 27.3, the 
sampling distribution of x is Gaussian with mean E[x] and variance V[x] = o 2 /N. Since 
(j = 1 in this case, the standard error is given by cr* = a/yjN = 0.32. Moreover, in 
subsection 27.3.3, we found the mean of the above sample to be x = 1.11. 

For the 90% central confidence interval, we require a = p = 0.05. From table 26.3, we 
find 

® -1 (l — a) = 0 ‘(0.95) = 1.65, 


and using (27.31) and (27.32) we obtain 

a_ = x — 1.65(7* = 1.11 - (1.65)(0.32) = 0.58, 
a+=x + 1.65(7* = 1.11 + (1.65)(0.32) = 1.64. 

Thus, the 90% central confidence interval on p is [0.58,1.64]. For comparison, the true 
value used to create the sample was p = 1. ◄ 
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In the case where the standard error a a in (27.33) is not known in advance, 
one must use a value <r„ estimated from the sample. In principle, this complicates 
somewhat the construction of confidence intervals, since properly one should 
consider the two-dimensional joint sampling distribution P(a, <7a|a). Nevertheless, 
in practice, provided a- a is a fairly good estimate of o~ a the above procedure may 
be applied with reasonable accuracy. In the special case where the sample values 
Xj are drawn from a Gaussian distribution with unknown p and a, it is in fact 
possible to obtain exact confidence intervals on the mean p, for a sample of any 
size N, using Student’s f-distribution. This is discussed in section 27.7.5. 


27.3.6 Estimation of several quantities simultaneously 

Suppose one uses a sample xi,X 2 ,...,xn to calculate the values of several es- 
timators a\,a 2 ,...,aM (collectively denoted by a) of the quantities ai,a 2 ,. ..,% 
(collectively denoted by a) that describe the population from which the sample was 
drawn. The joint sampling distribution of these estimators is an M-dimensional 
PDF P(a|a) given by 

P(a|a)d M a = P(x\a)d N x. 


► Sample values Xu* 2 are drawn independently from a Gaussian distribution with 
mean p and standard deviation a. Suppose we choose the sample mean x and sample stan- 
dard deviation s respectively as estimators ft and a. Find the joint sampling distribution of 
these estimators. 


Since each data value x, in the sample is assumed to be independent of the others, 
joint probability distribution of sample values is given by 


the 


P(x \p,o) = (2no 2 )- N/2 exp 


E ,-(*>• - h ) 2 

2 a 2 


We may rewrite the sum in the exponent as follows: 
y^(x, - p ) 2 = ]Tu', - x + x - p ) 2 

i i 


= X/*' “ x > 2 + 2 ( x - F) X ,Xi - x ) + X (x _ F ) 2 

i i i 

= Ns 2 + N(5c - p) 2 , 

where in the last line we have used the fact that EjGi — = 0- Hence, for given values 

of p and (7, the sampling distribution is in fact a function only of the sample mean x and 
the standard deviation s. Thus the sampling distribution of x and s must satisfy 

P(x, s|/(, a) dxds = (2no 2 y N ^ 2 exp | ^ 1 dV, (27.34) 

where dV = dx \ dxi ■ ■ ■ dxu is an element of volume in the sample space which yields 
simultaneously values of x and s that lie within the region bounded by [x. x + dx] and 
[s, s + ds]. Thus our only remaining task is to express dV in terms of x and s and their 
differentials. 
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Let S be the point in sample space representing the sample (xi,X 2 ,...,. xat). For given 
values of x and s, we require the sample values to satisfy both the condition 

Xj = Nx, 

i 

which defines an (N — 1 (-dimensional hyperplane in the sample space, and the condition 

— x) 2 = Ns 2 , 


which defines an (N — 1 (-dimensional hypersphere. Thus S is constrained to lie in the 
intersection of these two hypersurfaces, which is itself an (N — 2 (-dimensional hypersphere. 
Now, the volume of an ( N — 2 (-dimensional hypersphere is proportional to s N -1 . It follows 
from this that the volume dV between two concentric (N — 2 (-dimensional hyperspheres 
of radius s/ ~Ns and ^//Vfs + ds) and two ( N — 1 (-dimensional hyperplanes corresponding 
to x and x + dx is 

dV = As n ~ 2 dsdx. 


where A is some constant. Thus, substituting this expression for dV into (27.34), we find 


P(x, s|/(, a) = Ci exp 


N(x — p) 2 
2a 2 


CiS N 2 exp 


Ns 2 \ _ 

J ~ 


= P(x\ii,a)P(s\a), 


(27.35) 


where Ci and C 2 are constants. We have written P(x,s\f.i,a) in this form to illustrate that 
it separates naturally into two parts, one depending only on x and the other only on s. 
Thus, x and s are independent variables. Separate normalisations of the two factors in 
(27.35) require 


Ci = 


2na 2 I 


1/2 


and 


C 2 


= 2 


(N- 1)/2 


2a 2 J r(i(lY-l))’ 


where the calculation of C 2 requires the use of the gamma function discussed in the 
Appendix. ◄ 


The marginal sampling distribution of any one of the estimators a,- is given 
simply by 


P(a,|a) 


P(a|a) dai ■ ■ ■ dcii-\ da l+ \ ■ ■ ■ daM, 


and the expectation value E [a,] and variance V[dj] of a,- are again given by (27.14) 
and (27.16) respectively. By analogy with the one-dimensional case, the standard 
error erg. on the estimator a,- is given by the positive square root of V[ai\. With 
several estimators, however, it is usual to quote their full covariance matrix. This 
M x M matrix has elements 

Vij = Cov[a ; ,fl ; ] = J (a,- — £[a,])(flj — £[a 7 ])P(a|a)d M a 
= J (a t — E[dj])(dj — E[aj])P(x\a) d N x. 

Fisher’s inequality can be generalised to the multi-dimensional case. Adapting 
the proof given in subsection 27.3.2, one may show that, in the case where the 
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estimators are efficient and have zero bias, the elements of the inverse of the 
covariance matrix are given by 


(V~% = E 


d 2 In P ' 

dajdcij 


(27.36) 


where P denotes the population P(x|a) from which the sample is drawn. The 
quantity on the RHS of (27.36) is the element Fy of the so-called Fisher matrix 
F of the estimators. 


► Calculate the covariance matrix of the estimators x and s in the previous example. 


As shown in (27.35), the joint sampling distribution P(x,s \p,a) factorises, and so the 
estimators x and s are independent. Thus, we conclude immediately that 

Cov[x,s] = 0. 


Since we have already shown in the worked example at the end of subsection (27.3.1) that 
V[x] = a 2 /N , it only remains to calculate F[s]. From (27.35), we find 


E[f] =C 2 


/»00 

/ s N ~ 2+r exp 
Jo 




T/1 r(l(IY-l + r)) 

r (|(AT- 


where we have evaluated the integral using the definition of the gamma function given in 
the Appendix. Thus, the expectation value of the sample standard deviation is 


E[s] 


and its variance is given by 


/2 \ 1/2 r(ijv) 

\n] r (i(jv- 1)) ’ 


(27.37) 


V[s] 


E[s 2 ]-(E[s ]) 2 = 


N 


N- 1-2 


£(M l 2 ) 

r (|(tv — i)) J j 


We note, in passing, that (27.37) shows that s is a biased estimator of a. ◄ 


The idea of a confidence interval can also be extended to the case where several 
quantities are estimated simultaneously but then the practical construction of an 
interval is considerably more complicated. The general approach is to construct 
an M-dimensional confidence region R in a-space. By analogy with the one- 
dimensional case, for a given confidence level of (say) 1 — a, one first constructs 
a region R in a-space, such that 



P(a|a)r/ M a = 1 — a. 


A common choice for such a region is that bounded by the ‘surface’ P(a|a) = 
constant. By considering all possible values a and the values of a lying within 
the region R, one can construct a 2M-dimensional region in the combined space 
(a, a). Suppose now that, from our sample x, the values of the estimators are 
a,- 0 b s , i = 1,2, ...,M. The intersection of the M ‘hyperplanes’ a,- = a !>0 bs with 
the 2M-dimensional region will determine an M-dimensional region which, when 
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Figure 27.4 (a) The ellipse <2(a, a) = c in a-space. (b) The ellipse Q( a, a 0 b s ) = c 
in a-space that corresponds to a confidence region R at the level 1 — a, when 
c satisfies (27.39). 


projected onto a-space, will determine a confidence limit R at the confidence 
level 1 — a. It is usually the case that this confidence region has to be evaluated 
numerically. 

The above procedure is clearly rather complicated in general and a simpler 
approximate method that uses the likelihood function is discussed in subsec- 
tion 27.5.5. As a consequence of the central limit theorem, however, in the 
large-sample limit, N —* oo, the joint sampling distribution P(a|a) will tend, in 
general, towards the multivariate Gaussian 

F(i|a| ~ (2»)»qv|i /; e,p HeiM- 

where V is the covariance matrix of the estimators and the quadratic form Q is 
given by 

Q( a, a) = (a - a) T V _1 (a - a). 

Moreover, in the limit of large N, the inverse covariance matrix tends to the 
Fisher matrix F given in (27.36), i.e. V -1 —* F. 

For the Gaussian sampling distribution (27.38), the process of obtaining confi- 
dence intervals is greatly simplified. The surfaces of constant P(a|a) correspond 
to surfaces of constant Q( a, a), which have the shape of M-dimensional ellipsoids 
in a-space, centred on the true values a. In particular, let us suppose that the 
ellipsoid Q( a, a) = c (where c is some constant) contains a fraction 1 — a say of 
the total probability. Now suppose that, from our sample x, we obtain the values 
a 0 bs for our estimators. Because of the obvious symmetry of the quadratic form 
Q with respect to a and a, it is clear that the ellipsoid (9(a,a 0 b s ) = c in a-space 
that is centred on a 0 b s should contain the true values a with probability 1 — a. 
Thus g(a, a Q bs) = c defines our required confidence region R at this confidence 
level. This is illustrated in figure 27.4 for the two-dimensional case. 
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It remains only to determine the constant c corresponding to the confidence 
level 1 — a. As discussed in subsection 26.15.2, the quantity Q( a, a) is distributed 
as a x 2 variable of order M. Thus, the confidence region corresponding to the 
confidence level 1 — a is given by g(a, a 0 b s ) = c, where the constant c satisfies 

I' P(Xm) d(xh) = 1 — a, (27.39) 

Jo 

and P(x 2 m) is the chi-squared PDF of order M, discussed in subsection 26.9.4. This 
integral may be evaluated numerically to determine the constant c. Alternatively, 
some reference books tabulate the values of c corresponding to given confidence 
levels and various values of M. 


27.4 Some basic estimators 

In many cases, one does not know the functional form of the population from 
which a sample is drawn. Nevertheless, in a case where the sample values 
xi,X 2 , ...,xn are each drawn independently from a one-dimensional population 
P(x), it is possible to construct some basic estimators for the moments and central 
moments of P(x). In this section, we investigate the estimating properties of the 
common sample statistics presented in section 27.2. In fact, expectation values 
and variances of these sample statistics can be calculated without prior knowledge 
of the functional form of the population ; they depend only on the sample size N 
and certain moments and central moments of P(x). 


27.4.1 Population mean p 

Let us suppose that the parent population P(x) has mean p and variance er 2 . An 
obvious estimator p of the population mean is the sample mean x. Provided p 
and er 2 are both finite, we may apply the central limit theorem directly to obtain 
exact expressions, valid for samples of any size N, for the expectation value and 
variance of x. From parts (i) and (ii) of the central limit theorem, discussed in 
section 26.10, we immediately obtain 

2 

E[x] = p, V[x] = (27.40) 

Thus we see that x is an unbiased estimator of p. Moreover, we note that the 
standard error in x is o/^/N, and so the sampling distribution of x, becomes more 
tightly centred around p as the sample size N increases. Indeed, since V[x] —> 0 
as N — > oo, x is also a consistent estimator of p. 

In the limit of large N, we may in fact obtain an approximate form for the 
full sampling distribution of x. Part (iii) of the central limit theorem (see section 
26.10) tells us immediately that, for large N, the sampling distribution of x, is 
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given approximately by the Gaussian form 

(x~p) 2 ' 

2 a 2 / N _ ' 

Note that this does not depend on the form of the original parent population. 
If, however, the parent population is in fact Gaussian then this result is exact 
for samples of any size N (as is immediately apparent from our discussion of 
multiple Gaussian distributions in subsection 26.9.1). 


P(x\n,a) 


\/2na 2 /N 


exp 


27.4.2 Population variance er 2 

An estimator for the population variance a 2 is not so straightforward to define 
as one for the mean. Complications arise because, in many cases, the true mean 
of the population p is not known. Nevertheless, let us begin by considering the 
case where in fact p is known. In this event, a useful estimator is 

i N ( i N \ 

c2 = n £ (Xi - ^ = [n ^ x >) ~ ^ 2 ' (21Al) 


>-Show that a 2 is an unbiased and consistent estimator of the population variance a 1 . 


The expectation value of er 2 is given by 

N 


E i° 2 ] = ^ E 


E- 


- p 2 = E [x 2 ] - p 2 = p 2 - p 2 = (T 2 , 


from which we see that the estimator is unbiased. The variance of the estimator is 

N 


vw 2 ] = w _v 


E-' 


+ V[p 2 ] = j-[V[x 2 ] = ^(P4-P 2 2 ), 


in which we have used that fact that V[p 2 ] = 0 and V[x 2 ] = E[xf] — (E[x 2 ]) 2 = p 4 — p 2 , 
where p r is the rth population moment. Since a 1 is unbiased and V[a 2 ] — > 0 as IV — > oo, 
showing that it is also a consistent estimator of a 2 , the result is established. ◄ 


If the true mean of the population is unknown, however, a natural alternative 
is to replace p by x in (27.41), so that our estimator is simply the sample variance 
s 2 given by 



i= t 



In order to determine the properties of this estimator, we must calculate £[s 2 ] 
and V[s 2 ]. This task is straightforward but lengthy. However, for the investigation 
of the properties of a central moment of the sample, there exists a useful trick 
that simplifies the calculation. We can assume, with no loss of generality, that 
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the mean fi\ of the population from which the sample is drawn is equal to zero. 
With this assumption, the population central moments, v r , are identical to the 
corresponding moments /i r , and we may perform our calculation in terms of the 
latter. At the end, however, we replace /<,. by v r in the final result and so obtain 
a general expression that is valid even in cases where m 0. 


► Calculate £[s 2 ] and F[s 2 ] for a sample of size N. 


The expectation value of the sample variance s 2 for a sample of size N is given by 


E[s 2 ] 






1 

N 


NE[x 2 ]-^E 


J2 X 1 + Y1 x < x i 

i hi 


(27.42) 


The number of terms in the double summation in (27.42) is 2V(2V — 1), so we find 

£[s 2 ] = E[xj] - T(jV£[x 2 ] + N(N - l)£[x,oq]). 

Now, since the sample elements x t and xj are independent, E[x iXj] = E [x,] E [x ( ] = 0, 
assuming the mean of the parent population to be zero. Denoting the r th moment of 
the population by p r , we thus obtain 


,,,21 El N-\ 

E [s'l = U2 = ' 

1 1 ^ N N 


-El = 


N- 1 


N 


(27.43) 


where in the last line we have used the fact that the population mean is zero, and so 
Ei = V 2 = o' 2 - However, the final result is also valid in the case where pi f 0. 

Using the above method, we can also find the variance of s 2 , although the algebra is 
rather heavy going. The variance of s 2 is given by 

F[s 2 ] = £[s 4 ] - (E[s 2 ]) 2 , (27.44) 

where £[s 2 ] is given by (27.43). We therefore need only consider how to calculate £[s 4 ], 
where s 4 is given by 


s 4 - 

E,*? 

fEi-w-yi 

o — 

N 

V N ) 


(Ei x i ) 2 

N 2 


2 (Ei^?KE i Xif + (E,n ) 4 


N 3 


N 4 


(27.45) 


We will consider in turn each of the three terms on the RHS. In the first term, the sum 
(Ei x ?) 2 can b e written as 

(?* f ) 

m 


where the first sum contains N terms and the second contains N(N — 1) terms. Since the 
sample elements x,- and xj are assumed independent, we have E[xjx 2 ] = E [xj] E [x 2 ] = 
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and so 


E- 


= Nm + N(N- l)nl 


Turning to the second term on the RHS of (27.45), 

2 


E E *< = E x < + E x < x j + E + E 


xjxjXk. 


hj 

J¥* 


V 


UJ< 

k+j±i 


Since the mean of the population has been assumed to equal zero, the expectation values 
of the second and fourth sums on the RHS vanish. The first and third sums contain N 
and N(N — 1) terms respectively, and so 


E*? E*< 


= Nm + N(N-l)nl 


Finally, we consider the third term on the RHS of (27.45), and write 

*'1 = x t + x \ x i + x t xjxkxi. 


hj 

W 


hj 


iJJ< 

k*j& 


i,jXl 


The expectation values of the second, fourth and fifth sums are zero, and the first and third 
sums contain N and 3N(N — 1) terms respectively (for the third sum, there are N(N —l)/2 
ways of choosing i and j, and the multinomial coefficient of xjxj is 4!/(2!2!) = 6). Thus 


E- 


= JV/<4 + 3JV(JV - l)nl 


Collecting together terms, we therefore obtain 


rr4l , (IV — 1)(!V 2 — 2N + 3) 2 

E[s] = ^ H 2 , 


(27.46) 


N 3 ^ N 3 

which, together with the result (27.43), may be substituted into (27.44) to obtain finally 


, 2n _ (TV- l) 2 __ (JV-l)(lV-3) , 


CM = 


N 3 
N- 1 
N 3 


-m - 


N 3 


Hi 


[(N — l)v 4 — (JV — 3)v|], 


(27.47) 


where in the last line we have used again the fact that, since the population mean is zero, 
l.i r = vy. However, result (27.47) holds even when the population mean is not zero. ◄ 


From (27.43), we see that s 2 is a biased estimator of a 1 , although the bias 
becomes negligible for large N. However, it immediately follows that an unbiased 
estimator of a 2 is given simply by 


N 2 
° = N^ T S ’ 


(27.48) 


where the multiplicative factor TV / (TV — 1) is often called Bessel’s correction. Thus 
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in terms of the sample values x,-, i = 1,2, ...,1V, an unbiased estimator of the 
population variance c 2 is given by 

1 N 

a 2 = - x ) 2 ' (27.49) 

!=1 


Using (27.47), we find that the variance of the estimator a 2 is 


V[a 2 ] 


N 


N- 1 


U[s 2 ] = 


1 

N 



N - 3 
TV — 1 



where vy is the rth central moment of the parent population. We note that, 
since E[tr 2 ] = a 2 and V[a 2 ] — » 0 as N — > oo, the statistic u 2 is also a consistent 
estimator of the population variance. 


27.4.3 Population standard deviation a 

The standard deviation a of a population is defined as the positive square root of 
the population variance a 2 (as, indeed, our notation suggests). Thus, it is common 
practice to take the positive square root of the variance estimator as our estimator 
for a. Thus, we take 


a = 



(27.50) 


where a 2 is given by either (27.41) or (27.48), depending on whether the population 
mean /< is known or unknown. Because of the square root in the definition of 
<7, it is not possible in either case to obtain an exact expression for E[a] and 
V[o], Indeed, although in each case the estimator is the positive square root of 
an unbiased estimator of a 2 , it is not itself an unbiased estimator of a. However, 
the bias does becomes negligible for large N. 


► Obtain approximate expressions for E[o] and V[o] for a sample of size N in the case 
where the population mean p is unknown. 


As the population mean is unknown, from (27.50) and (27.48) our estimator is given by 


N 


N- 1 


1/2 


where s is the sample standard deviation. The expectation value of this estimator is given 
by 


E[o] = 




An approximate expression for the variance of 6 may be found using (27.47 ) and is given 
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by 


Vl&] = ^^V[(s 2 ) 1 ' 2 ] N 


N- 1 


IV — 1 
N 

N-l 


d 

KF) 

1 

4s 2 


( s 2 ) 1 / 2 


J s 2 =£[s 2 ] 


=B[ S 2] 

V[s 2 ]. 


V[s 2 


Using the expressions (27.43) and (27.47) for £[s 2 ] and E[s 2 ] respectively, we obtain 


v[d] 


l 

4Av 2 


N - 3 ■ 

va v : 

N-l ' 


27.4.4 Population moments p r 


We may straightforwardly generalise our discussion of estimation of the popu- 
lation mean p (= pi) in section 27.4.1 to the estimation of the rth population 
moment p r . An obvious choice of estimator is the rth sample moment m r . The 
expectation value of m r is given by 

E[rn r ] = ^E[x' i ] = ^=p r , 

i= 1 

and so it is an unbiased estimator of p r . 

The variance of m r may be found in a similar manner, although the calculation 
is a little more complicated. We find that 


V[m r ] = E[(m r — p r ) ] 


= J_£ 

N 2 


N2 E 


X'' — N p r 


E x ? + E E ^ - 2jV/ ' r E + N2 rf 

< ' J4=i ' 


= j^2r - + ^2 E E £ W- 


(27.51) 


> j+i 


However, since the sample values x,- are assumed to be independent, we have 


E[x\x]] = E [x']£ [xT] = pi 


(27.52) 


The number of terms in the sum on the RHS of (27.51) is N(N — 1), and so we 
find 

T , r l 1 2,^—1 2 h2r - Hr . . 

v[m r ] = — P2r ~ Hr + ^ — • (27.53) 

Since E[m r ] = p r and V[m r ] — > 0 as N — * oo, the rth sample moment m r is also a 
consistent estimator of p r . 
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►Find the covariance of the sample moments m r and m s for a sample of size N. 


We obtain the covariance of the sample moments m r and m s in a similar manner to that 
used above to obtain the variance of m r . From the definition of covariance, we have 


Cov [m r ,m s \ = E[(m r — p r )(m s — /(,)] 


1 

N i E 


n 2 - e 


) >>r V" 


x '- x ) ~ A ."' x j - v ."- v < + v 

i i 0> j > 


Assuming the x,- to be independent, we may again use result (27.52) to obtain 

Cov[m r ,m s ] = --^2 [Npr+s + N(N - l)p,p s - N 2 p r p s - N 2 p s p r + N 2 p r p s ] 

1 A- 1 

ftr+s fa [As 

~ N ’ 


We note that by setting r = s, we recover the expression (27.53) for V[m r ], ◄ 


27.4.5 Population central moments vy 

We may generalise the discussion of estimators for the second central moment V 2 
(or equivalently cr 2 ) given in subsection 27.4.2 to the estimation of the rth central 
moment v r . In particular, we saw in that subsection that our choice of estimator 
for v 2 depended on whether the population mean /q is known; the same is true 
for the estimation of vy. 

Let us first consider the case in which p\ is known. From (26.54), we may write 
vy as 

Vr — lb — r ClPr-lp.1 + • • ■ + (—1)* ’ Ckllr-kl^i + • • • + (—1)’ 1 C, — 1 — f)p\- 

If pi is known, a suitable estimator is obviously 

Vr = m r - r C l m r - l pi 4 F (-1) ,: r C k m r - k p\ H F ( — l) r— x (' C,—i - \)p\, 

where m r is the rth sample moment. Since /q and the binomial coefficients are 
(known) constants, it is immediately clear that E [vy] = vy, and so vy is an unbiased 
estimator of vy. It is also possible to obtain an expression for F[vy], though the 
calculation is somewhat lengthy. 

In the case where the population mean /q is not known, the situation is more 
complicated. We saw in subsection 27.4.2 that the second sample moment n 2 (or 
s 2 ) is not an unbiased estimator of v 2 (or o 2 ). Similarly, the r th central moment of 
a sample, n r , is not an unbiased estimator of the r th population central moment 
vy. However, in all cases the bias becomes negligible in the limit of large N. 
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As we also found in the same subsection, it is rather complicated to calculate 
the expectation and variance of n 2 ; this complication increases considerably for 
general r. Nevertheless, we have derived already in this chapter exact expressions 
for the expectation value of the first few sample central moments, which are valid 
for samples of any size A. From (27.40), (27.43) and (27.46), we find 


E[n t] = 0, 

rr i N ~ l 
Eini] = N v 2 , 

E[n 2 2 ] = N ^[(N - l)v 4 + (TV 2 - 2A + 3)v 2 ]. 


(27.54) 


* 2J A3 

By similar arguments it can be shown that 

(A — 1)(TV — 2) 

£ [« 3 ] = ^ v 3 , 


E [n 4 \ = 


A- 1 
A 3 


N 2 

[(A 2 - 3A + 3)v 4 + 3(2A - 3)v|]. 


(27.55) 

(27.56) 


From (27.54) and (27.55), we see that unbiased estimators of V 2 and V 3 are 

A 


V2 = 


A- 1 


n 2 , 

A 2 


V3 (A- l)(A-2)" 3 ’ 


(27.57) 

(27.58) 


where (27.57) simply re-establishes our earlier result that a 2 = As 2 /(A — 1) is an 
unbiased estimator of a 2 . 

Unfortunately, the pattern that appears to be emerging in (27.57) and (27.58) 
is not continued for higher r, as is seen immediately from (27.56). Nevertheless, 
in the limit of large A, the bias becomes negligible, and often one simply takes 
v r = n r . For large A, it may be shown that 


E[n r ] » v r 

V[n r ] « ~ (v 2 r - v, 2 + r 2 v 2 v 2 _! - 2rv r _iv,. + i) 

Cov[n r ,n s ] « ^r(v r+s - v r v s + rsv 2 v r _iv s _ 1 - ny_iv 5+1 - sv s _iv r+1 ) 


27.4.6 Population covariance Cov[x, y] and correlation Corr[x,y ] 

So far we have assumed that each of our A independent samples consists of 
a single number x,-. Let us now extend our discussion to a situation in which 
each sample consists of two numbers x,-, y t , which we may consider as being 
drawn randomly from a two-dimensional population P(x,y). In particular, we 
now consider estimators for the population covariance Cov[x, y] and for the 
correlation Corr[x, y]. 
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When ji x and fi y are known, an appropriate estimator of the population covari- 
ance is 


_ ( x N 

Cov[x, y] = xy - n x n y = ( — E 


xm - l-lxlly. 


This estimator is unbiased since 


(27.59) 


E 


Cov[x, y ] 


1 „ 

— E 
N 


E 


- HxHy = E[xiyi] ~ lEHy = Cov[x, y]. 


Alternatively, if j.i x and pt y are unknown, it is natural to replace /i x and fi y in 
(27.59) by the sample means x and y respectively, in which case we recover the 
sample covariance V xy = xy — xy discussed in subsection 27.2.4. This estimator 
is biased but an unbiased estimator of the population covariance is obtained by 
forming 


Cov[x, y] = 


N 


N ■ 


-V 

l 'xy 


(27.60) 


► Calculate the expectation value of the sample covariance V xy for a sample of size N. 


The sample covariance is given by 



Thus its expectation value is given by 


E[V xy ] = -E 


E x ‘y< 


W E 


= E[xiy,] - 


E**J (Ev 
E w, + E Xiy j 


i >j 

m 


Since the number of terms in the double sum on the RHS is N(N — 1), we have 
E[V xy ] = £[x,-y,] - E (N E [x,-.v,] + N(N - l)£[x,y ; ]) 

= E[xiyH ~ E (NE[xiyi] + N(N - 1 )E[xj]E[yj]) 

1 N — 1 

= E[xiyil - {E[xiyi] + (N - l)(i x p y ) = — — Cov[x,y], 

where we have used the fact that, since the samples are independent, E [x;}'/] = E [x,] E [y ( ] . ◄ 
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It is possible to obtain expressions for the variances of the estimators (27.59) 
and (27.60) but these quantities depend upon higher moments of the population 
P(x,y) and are extremely lengthy to calculate. 

Whether the means p x and p y are known or unknown, an estimator of the 
population correlation Corrfx, y] is given by 


Corrfx, j] = 


Covfx, y] 


(27.61) 


where Covfx, y], a x and a y are the appropriate estimators of the population co- 
variance and standard deviations. Although this estimator is only asymptotically 
unbiased, i.e. for large N, it is widely used because of its simplicity. Once again 
the variance of the estimator depends on the higher moments of P(x,y) and is 
difficult to calculate. 

In the case in which the means p x and /<,, are unknown, a suitable (but biased) 
estimator is 


MV N 

Corrfx, y] = — — xy = r xy , (27.62) 

N — 1 s x s y TV — 1 

where s x and s y are the sample standard deviations of the x,- and y t respectively 
and r xy is the sample correlation. In the special case when the parent population 
P(x,y) is Gaussian, it may be shown that, if p = Corrfx, y], 

E [r xy ] =P~ PU 2 ~ p2) + O (TV~ 2 ), (27.63) 

V[r xy ] = ^(1 - p 2 ) 2 + 0(iV- 2 ), (27.64) 

from which the expectation value and variance of the estimator Corrfx, y] may 
be found immediately. 

We note finally that our discussion may be extended, without significant al- 
teration, to the general case in which each data item consists of n numbers 

Xi-, Ti, ■ ■ ■ , Zj. 


27.4.7 A worked example 

We conclude our discussion of basic estimators by reconsidering the set of 
experimental data given in subsection 27.2.4. 
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► Tef! UK citizens are selected at random and their heights and weights are found to be as 
follows ( to the nearest cm or kg respectively ) : 

Person ABCDEFGH I J 

Height (cm) 194 168 177 180 171 190 151 169 175 182 

Weight (kg) 75 53 72 80 75 75 57 67 46 68 

Estimate the means, p x and p y , and standard deviations, o x and a y , of the two-dimensional 
joint population from which the sample was drawn, quoting the standard error on the esti- 
mate in each case. Estimate also the correlation Corr[x, y] of the population, and quote the 
standard error on the estimate under the assumption that the population is a multivariate 
Gaussian. 


In subsection 27.2.4, we calculated various sample statistics for these data. In particular, 
we found that for our sample of size N = 10, 


x = 175.7, y = 66.8, 
s x = 11.6, Sy = 10.6, r xy = 0.54. 


Let us begin by estimating the means p x and p y . As discussed in subsection 27.4.1, the 
sample mean is an unbiased, consistent estimator of the population mean. Moreover, the 
standard error on x (say) is o x /^jN. In this case, however, we do not know the true value 
of <j x and we must estimate it using d x = WJ(N — l)s x . Thus, our estimates of p x and 
p y , with associated standard errors, are 


u x = x + , Sx = 175.7 + 3.9, 

“ T 

u y = y + ‘ •' = 66.8 + 3.5. 

■ - Viv^T 


We now turn to estimating a x and <s y . As just mentioned, our estimate of <j x (say) 
is d x = \f N / {N — 1 )s x . Its variance (see the final line of subsection 27.4.3) is given 
approximately by 


v[b] 


l 

4Nv 2 



N - 3 
N- 1 



Since we do not know the true values of the population central moments v 2 and v 4 , we 
must use their estimated values in this expression. We may take v 2 = a 2 x = (<r) 2 , which we 
have already calculated. It still remains, however, to estimate v 4 . As implied near the end 
of subsection 27.4.5, it is acceptable to take v 4 = n 4 . Thus for the x t and y t values, we have 


1 

(n)x = X^ A '' _ = 53411 - 6 

i=l 
1 N 

(H = ^ E ^-^) 4 = 27732 - 5 


Substituting these values into (27.50), we 


N 




1/2 


N 


N- 1 


1/2 


obtain 

+ (V[& x ]) l *-= 12.2 + 6.7, 
+ (F[ct v ]) 1/2 = 11.2 + 3.6. 


(27.65) 

(27.66) 
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Finally, we estimate the population correlation Corr[x,y], which we shall denote by p. 
From (27.62), we have 

N 


P = 


N- 1 


= 0.60. 


Under the assumption that the sample was drawn from a two-dimensional Gaussian 
population P(x,y), the variance of our estimator is given by (27.64). Since we do not know 
the true value of p, we must use our estimate p. Thus, we find that the standard error A p 
in our estimate is given approximately by 


ApX T (b) [1 - (0.60) 2 ] 2 = 0.05. ◄ 


27.5 Maximum-likelihood method 


The population from which the sample xi,X2,...,xjv is drawn is, in general, 
unknown. In the previous section, we assumed that the sample values were inde- 
pendent and drawn from a one-dimensional population P(x), and we considered 
basic estimators of the moments and central moments of P(x). We did not, how- 
ever, assume a particular functional form for P(x). We now discuss the process 
of data modelling, in which a specific form is assumed for the population. 

In the most general case, it will not be known whether the sample values are 
independent, and so let us consider the full joint population P(x), where x is the 
point in the iV-dimensional data space with coordinates xi,x 2 , ...,Xjv. We then 
adopt the hypothesis H that the probability distribution of the sample values has 
some particular functional form L(x; a), dependent on the values of some set of 
parameters a,-, i = 1,2 Thus, we have 

P(x\a,H) = L(x;a), 

where we make explicit the conditioning on both the assumed functional form and 
on the parameter values; L(x; a) is called the likelihood function. Hypotheses of 
this type form the basis of data modelling and parameter estimation. One proposes a 
particular model for the underlying population and then attempts to estimate from 
the sample values xi, x 2 , . . . , xjv the values of the parameters a defining this model. 


► A company measures the duration (in minutes) of the N intervals Xj, i = 1,2, ...,1V 
between successive telephone calls received by its switchboard. Suppose that the sample 
values Xi are drawn independently from the distribution P(x |t) = (l/t)exp(— x/x), where x 
is the mean interval between calls. Calculate the likelihood function L(x;x). 


Since the sample values are independent and drawn from the stated distribution, the 
likelihood is given by 

L(x; x) = P(x/|t)P(x 2 |t)- • ■ P(x n |t) 


1 / Xl\ 1 / X 2 \ 1 ( x N \ 

- exp exp exp 

T VT/T V T / T V T / 


= i ex P 


(xi + X 2 + ' ' ■ + Xn) 


(27.67) 


which is to be considered as a function of x, given that the sample values x,- are fixed. ◄ 
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L(x;t) L(x; t) 



Figure 27.5 Examples of the likelihood function (27.67) for samples of dif- 
ferent size N. In each case, the true value of the parameter is x = 4 and the 
sample values x,- are indicated by the short vertical lines. For the purposes 
of illustration, in each case the likelihood function is normalised so that its 
maximum value is unity. 

The likelihood function (27.67) depends on just a single parameter x. Plots of 
the likelihood function, considered as a function of x, are shown in figure 27.5 for 
samples of different size N. The true value of the parameter x used to generate the 
sample values was 4. In each case, the sample values x,- are indicated by the short 
vertical lines. For the purposes of illustration, the likelihood function in each 
case has been scaled so that its maximum value is unity (this is, in fact, common 
practice). We see that when the sample size is small, the likelihood function is very 
broad. As N increases, however, the likelihood becomes narrower (it is inversely 
proportional to y/N) and tends to a Gaussian-like shape, with its peak centred 
on 4, the true value of x. We discuss these properties of the likelihood function 
in more detail in subsection 27.5.6. 

27.5.1 The maximum-likelihood estimator 

Since the likelihood function L(x; a) gives the probability density associated with 
any particular set of values of the parameters a, our best estimate a of these 
parameters is given by the values of a for which L(x; a) is a maximum. This is 
called the maximum-likelihood estimator (or ML estimator). 

In general, the likelihood function can have a complicated shape when con- 
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L(x; a) 



L(x;a) 



L(x;a) 




Figure 27.6 Typical shapes of one-dimensional likelihood functions L(x;a) 
encountered in practice, where, for illustration purposes, it is assumed that 
the parameter a is restricted to the range zero to infinity. The ML estimator 
in the various cases occurs at: (a) the only stationary point; (b) one of several 
stationary points; (c) an end-point of the allowed parameter range that is not 
a stationary point (although stationary points do exist); ( d ) an end-point of 
the allowed parameter range in which no stationary point exists. 


sidered as a function of a, particularly when the dimensionality of the space of 
parameters cii,a 2 ,...,a M is large. It may be that the values of some parameters 
are either known or assumed in advance, in which case the effective dimension- 
ality of the likelihood function is reduced accordingly. However, even when the 
likelihood depends on just a single parameter a (either intrinsically or as the 
result of assuming particular values for the remaining parameters), its form may 
be complicated when the sample size N is small. Frequently occurring shapes of 
one-dimensional likelihood functions are illustrated in figure 27.6, where we have 
assumed, for definiteness, that the allowed range of the parameter a is zero to 
infinity. In each case, the ML estimate a is also indicated. Of course, the ‘shape’ of 
higher-dimensional likelihood functions may be considerably more complicated. 

In many simple cases, however, the likelihood function L(x;a) has a single 
maximum that occurs at a stationary point (the likelihood function is then termed 
unimodal). In this case, the ML estimators of the parameters a,-, i = 1,2, ...,M, 
may be found without evaluating the full likelihood function L(x; a). Instead, one 
simply solves the M simultaneous equations 

= 0 for i = 1,2,..., M. (27.68) 

a— a 
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Since lnz is a monotonically increasing function of z (and therefore has the 
same stationary points), it is often more convenient, in fact, to maximise the 
log-likelihood function, lnL(x;a), with respect to the a,-. Thus, one may, as an 
alternative, solve the equations 

= 0 for i = 1,2,..., M. (27.69) 

da i a=a 

Clearly, (27.68) and (27.69) will lead to the same ML estimates a of the parameters. 
In either case, it is, of course, prudent to check that the point a = a is a local 
maximum. 


►Tine/ the ML estimate of the parameter x in the previous example, in terms of the measured 
values Xu i = 1, 2, . . . , N. 


From (27.67), the log-likelihood function in this case is given by 

N N N 

In L(x; r) = J2 ln ( ~e~ x,/x J = - J2 ( ln T + “) • (27.70) 

i=l k T / i= 1 T 

Differentiating with respect to the parameter x and setting the result equal to zero, we find 


N n 

— E - 


3 ln L 

dx 


Thus the ML estimate of the parameter x is given by 

1 


= 0 . 


■sS' 


(27.71) 


which is simply the sample mean of the N measured intervals. ◄ 


In the previous example we assumed that the sample values x ; were drawn 
independently from the same parent distribution. The ML method is more flexible 
than this restriction might seem to imply and it can equally well be applied to the 
common case in which the samples x, are independent but each is drawn from a 
different distribution. 


►/n an experiment, N independent measurements x,- of some quantity are made. Suppose 
that the random measurement error on the i th sample value is Gaussian distributed with 
mean zero and known standard deviation <r,. Calculate the ML estimate of the true value 
/( of the quantity being measured. 


As the measurements are independent, the likelihood factorises : 

N 

L(x;p,{a k }) = Y\P(xi\p,Oi), 

i=l 


where {a k } denotes collectively the set of known standard deviations 01 , 02 ,. The 
individual distributions are given by 


P(x i\n. Of) 


1 

sj2nof 


exp 


(Xi - p ) 1 2 

2 of 
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and so the full log-likelihood function is given by 


1 N 

In L(x;p, {o>}) = 


ln(2 naj) + 


(x,-p) 2 


Differentiating this expression with respect to p and setting the result equal to zero, we 
find 


8 InL 

dfi 


N 



= 0, 


from which we obtain the ML estimator 


U = 


SLfe/^r) 


(27.72) 


This estimator is commonly used when averaging data with different statistical weights 
w, = l/cr f. We note that when all the variances of have the same value the estimator 
reduces to the sample mean of the data x,. ◄ 


There is, in fact, no requirement in the ML method that the sample values 
be independent. As an illustration, we shall generalise the above example to a 
case in which the measurements x, are not all independent. This would occur, for 
example, if these measurements were based at least in part on the same data. 


►7n an experiment N measurements Xj of some quantity are made. Suppose that the random 
measurement errors on the samples are drawn from a joint Gaussian distribution with mean 
zero and known covariance matrix V. Calculate the ML estimate of the true value p of the 
quantity being measured. 


From (26.148), the likelihood in this case is given by 


L(x;/r, v ) = (27t)W / 2|v|1/2 ex P H(x-/'1) T V J (x- /rl)] , 

where x is the column vector with components xi,X 2 ,...,xn and 1 is the column vector 
with all components equal to unity. Thus, the log-likelihood function is given by 

lnL(x;/i, V) = — \ [JV ln(27t) + In |V| + (x — /d ) T V -1 (x — /d )] • 

Differentiating with respect to p and setting the result equal to zero gives 


8 InL 

8p 

Thus, the ML estimator is given by 


= 1 T V x (x — jul) = 0. 


. i T v->x E„-( v-%xj 
11 i T v 1 1 >:, ; <i d, • 

In the case of uncorrelated errors in measurement, (F _1 ) i; - = Sy/af an d our estimator 
reduces to that given in (27.72). ◄ 


In all the examples considered so far, the likelihood function has been effectively 
one-dimensional, either instrinsically or under the assumption that the values of 
all but one of the parameters are known in advance. As the following example 
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involving two parameters shows, the application of the ML method to the 
estimation of several parameters simultaneously is straightforward. 


►/;? an experiment N measurements x, of some quantity are made. Suppose the random error 
on each sample value is drawn independently from a Gaussian distribution of mean zero but 
unknown standard deviation a ( which is the same for each measurement ). Calculate the ML 
estimates of the true value p of the quantity being measured and the standard deviation o 
of the random errors. 


In this case the log-likelihood function is given by 


1 N 

In L(x;p,o) = -- ^ 


\n(2n<y 2 ) + — — 


Taking partial derivatives of In L with respect to p and a and setting the results equal to 
zero at the joint estimate p, a, we obtain 


E 


E 


(*; - A) 2 


X 

II 

O 

(27.73) 

N . 

E =o. 

(27.74) 


i=l i=l 

In principle, one should solve these two equations simultaneously for p and d , but in this 
case we notice that the first is solved immediately by 

1 


^=N^Z X > = 


where x is the sample mean. Substituting this result into the second equation, we find 


\ 


- El *.--*) 2 — 


where s is the sample standard deviation. As shown in subsection 27.4.3, s is a biased 
estimator of a. The reason why the ML method may produce a biased estimator is 
discussed in the next subsection. ◄ 


27.5.2 Transformation invariance and bias of ML estimators 

An extremely useful property of ML estimators is that they are invariant to 
parameter transformations. Suppose that, instead of estimating some parameter 
a of the assumed population, we wish to estimate some function a (a) of the 
parameter. The ML estimator 5(a) is given by the value assumed by the function 
tx(a) at the maximum point of the likelihood, which is simply equal to a (a). Thus, 
we have the very convenient property 

a(a) = a (a). 

We do not have to worry about the distinction between the two cases, estimating 
a and estimating a function of a. This is not true, in general, for other estimation 
procedures. 
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► A company measures the duration (in minutes) of the N intervals x,-, i = 1,2 
between successive telephone calls received by its switchboard. Suppose that the sample 
values Xj are drawn independently from the distribution P(x|t) = (l/r)exp(— x/r). Find the 
ML estimate of the parameter X = 1/t. 


This is the same problem as that considered at the start of section 27.5.1. In terms of the 
new parameter X, the log-likelihood function is given by 

N N 

lnL(x;2) = ^ ln(Xe~ Xxi ) = ^ ' ( 1 n X — Xxi). 

i= 1 i= 1 

Differentiating with respect to X and setting the result equal to zero, we have 


<51nL 

8X 



Thus, the ML estimator of the parameter X is given by 


X = 




(27.75) 


Referring back to (27.71), we see that, as expected, the ML estimators of X and r are 
related by X = 1/t. ◄ 


Although this invariance property is useful it also means that, in general, ML 
estimators may be biased. In particular, one must be aware of the fact that even 
if a is an unbiased ML estimator of a it does not follow that the estimator ot(a ) is 
also unbiased. In the limit of large N, however, the bias of ML estimators always 
tends to zero. As an illustration, it is straightforward to show (see exercise 27.8) 
that the ML estimators i and 2 in the above example have expectation values 

E[ t]=t and E[l] = — ^-1 (27.76) 

In fact, since r = x and the sample values are independent, the first result follows 
immediately from (27.40). Thus, i is unbiased, but 2 = 1/t is biased, albeit that 
the bias tends to zero for large N. 


27.5.3 Efficiency of ML estimators 

We showed in subsection 27.3.2 that Fisher’s inequality puts a lower limit on the 
variance V[a ] of any estimator of the parameter a. Under our hypothesis H on 
p. 1097, the functional form of the population is given by the likelihood function, 
i.e. P(x|a, H) = L(x; a). Thus, if this hypothesis is correct, we may replace P by 
L in Fisher’s inequality (27.18), which then reads 

f[s K i+ £) V £ 

where b is the bias in the estimator a. We usually denote the RHS by V mm . 


8“ In L 

da 2 
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An important property of ML estimators is that if there exists an efficient 
estimator a e s, i.e. one for which L[u e ff] = L m j n , then it must be the ML estimator 
or some function thereof. This is easily shown by replacing P by L in the proof 
of Fisher’s inequality given in subsection 27.3.2. In particular, we note that the 
equality in (27.22) holds only if fi(x) = cg(x), where c is a constant. Thus, if an 
efficient estimator a e ff exists, this is equivalent to demanding that 

dlnL 

—5 — = - a(fl)]- 

da 

Now, the ML estimator «ml is given by 


dlnL 

da 


= 0 


A— ^ML 


C[fl e ff -a(flML)] = 0, 


which, in turn, implies that n e ft must be some function of 5ml- 


► S/iovv that the ML estimator z given in ( 27.71 ) is an efficient estimator of the parameter z. 


As shown in (27.70), the log-likelihood function in this case is 


N 

In L(x; t) = — ^ ^ln z + —'j . 

i=l T 

Differentiating twice with respect to z, we find 


aMnL _ -A / i _ 2 Xi \ _ N ( 2 A 

8z 2 \r 2 z 3 ) z 2 1 zN 

i= 1 v 7 \ i= 1 


and so the expectation value of this expression is 


E 


r<3 2 lnLl 


dz 2 




(27.77) 


where we have used the fact that E [x] = z. Setting h 
any unbiased estimator of z, 


L[r] > -. 


0 in (27.18), we thus find that for 


From (27.76), we see that the ML estimator f = JTx,/lV is unbiased. Moreover, using 
the fact that V[x] = z 2 , it follows immediately from (27.40) that V[z] = z 2 /N. Thus z is a 
minimum-variance estimator of z. ◄ 


27.5.4 Standard errors and confidence limits on ML estimators 

The ML method provides a procedure for obtaining a particular set of estimators 
a ML for the parameters a of the assumed population P(x|a). As for any other set 
of estimators, the associated standard errors, covariances and confidence intervals 
can be found as described in subsections 27.3.3 and 27.3.4. 
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P(i It) 



Figure 27.7 The sampling distribution P(t\t) for the estimator r for the case 
r = 4 and N = 10. 


► -1 company measures the duration (in minutes) of the 10 intervals x,, i = 1,2, ...,10, 
between successive telephone calls made to its switchboard to be as follows: 

0.43 0.24 3.03 1.93 1.16 8.65 5.33 6.06 5.62 5.22. 

Supposing that the sample values are drawn independently from the probability distribution 
P(x | t) = (l/r)exp(— x/t), find the ML estimate of the mean r and quote an estimate of 
the standard error on your result. 


As shown in (27.71) the (unbiased) ML estimator r in this case is simply the sample mean 
x = 3.77. Also, as shown in subsection 27.5.3, t is a minimum-variance estimator with 
V[% ] = t 2 /N. Thus, the standard error in t is simply 

(27.78) 

Since we do not know the true value of r, however, we must instead quote an estimate cy 
of the standard error, obtained by substituting our estimate t for r in (27.78). Thus, we 
quote our final result as 

t = t T — 7= = 3.77 "E 1.19. (27.79) 

“ y/N 

For comparison, the true value used to create the sample was r = 4. ◄ 

For the particular problem considered in the above example, it is in fact possible 
to derive the full sampling distribution of the ML estimator i using characteristic 
functions, and it is given by 

jyJV *N - 1 / Nt\ 

Pm ' , = <N=w^r exr ‘(— rj- (27 ' 80) 

where N is the size of the sample. This function is plotted in figure 27.7 for the 
case t = 4 and N = 10, which pertains to the above example. Knowledge of the 
analytic form of the sampling distribution allows one to place confidence limits 
on the estimate t obtained, as discussed in subsection 27.3.4. 
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► Using the sample values in the above example, obtain the 68% central confidence interval 
on the value of x. 


For the sample values given, our observed value of the ML estimator is r obs = 3.77. Thus, 
from (27.28) and (27.29), the 68% central confidence interval [t_,t+] on the value of x is 
found by solving the equations 


f 


P(x\x+)dx = 0.16, 
P(t|t_) dx = 0.16, 


where P{x\x) is given by (27.80) with N = 10. The above integrals can be evaluated 
analytically but the calculations are rather cumbersome. It is much simpler to evaluate 
them by numerical integration, from which we find [t_,t+] = [2.86,5.46]. Alternatively, 
we could quote the estimate and its 68% confidence interval as 


t = 3.77 


+ 1.69 
- 0 . 9 1 ' 


Thus we see that the 68% central confidence interval is not symmetric about the estimated 
value, and differs from the standard error calculated above. This is a result of the (non- 
Gaussian) shape of the sampling distribution P(x\x), apparent in figure 27.7. ◄ 


In many problems, however, it is not possible to derive the full sampling 
distribution of an ML estimator a in order to obtain its confidence intervals. 
Indeed, one may not even be able to obtain an analytic formula for its standard 
error o- a . This is particularly true when one is estimating several parameter a 
simultaneously, since the joint sampling distribution will be, in general, very 
complicated. Nevertheless, as we discuss below, the likelihood function L(x; a) 
itself can be used very simply to obtain standard errors and confidence intervals. 
The justification for this has its roots in the Bayesian approach to statistics, as 
opposed to the more traditional frequentist approach we have adopted here. We 
now give a brief discussion of the Bayesian viewpoint on parameter estimation. 


27.5.5 The Bayesian interpretation of the likelihood function 

As stated at the beginning of section 27.5, the likelihood function L(x; a) is 
defined by 

P{x\a,H) = L(x; a), 


where H denotes our hypothesis of an assumed functional form. Now, using 
Bayes’ theorem (see subsection 26.2.3), we may write 


P(a|x,H) 


P(x\a,H)P{a\H) 

P(x\H) 


(27.81) 


which provides us with an expression for the probability distribution P(a|x, H) 
of the parameters a, given the (fixed) data x and our hypothesis H , in terms of 
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other quantities that we may assign. The various terms in (27.81) have special 
formal names, as follows. 

• The quantity P(a\H) on the RHS is the prior probability, which represents our 
state of knowledge of the parameter values (given the hypothesis H) before we 
have analysed the data. 

• This probability is modified by the experimental data x through the likelihood 
P(x\a,H). 

• When appropriately normalised by the evidence P(x\H), this yields the posterior 
probability P(a\x,H), which is the quantity of interest. 

• The posterior encodes all our inferences about the values of the parameters a. 
Strictly speaking, from a Bayesian viewpoint, this entire function, P(a\x,PI), is 
the ‘answer’ to a parameter estimation problem. 

Given a particular hypothesis, the (normalising) evidence factor P(x\H) is 
unimportant, since it does not depend explicitly upon the parameter values a. 
Thus, it is often omitted and one considers only the proportionality relation 

P(a\x,H) oc P(x\a,H)P(a\H). (27.82) 

If necessary, the posterior distribution can be normalised empirically, by requiring 
that it integrates to unity, i.e. f P(a\x,H)d"’a = 1, where the integral extends over 
all values of the parameters ai,a 2 , 

The prior P(a\H) in (27.82) should reflect our entire knowledge concerning the 
values of the parameters a, before the analysis of the current data x. For example, 
there may be some physical reason to require some or all of the parameters to 
lie in a given range. If we are largely ignorant of the values of the parameters, 
we often indicate this by choosing a uniform (or very broad) prior, 

P(a\H) = constant, 

in which case the posterior distribution is simply proportional to the likelihood. 
In this case, we thus have 


P{a\x,H) oc L(x;a). (27.83) 

In other words, if we assume a uniform prior then we can identify the posterior 
distribution (up to a normalising factor) with L(x;a), considered as a function of 
the parameters a. 

Thus, a Bayesian statistician considers the ML estimates Sml of the parameters 
to be the values that maximise the posterior P(a|x, H) under the assumption of 
a uniform prior. More importantly, however, a Bayesian would not calculate the 
standard error or confidence interval on this estimate using the (classical) method 
employed in subsection 27.3.4. Instead, a far more straightforward approach is 
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adopted. Let us assume, for the moment, that one is estimating just a single 
parameter a. Using (27.83), we may determine the values a_ and a + such that 

r- 

Pr(u < a_|x,H) = / L(x;a)d« = a, 

J GO 
/»00 

Pr(« > a + \x,H) = / L(x;a)d« = /(. 

J a+ 

where it is assumed that the likelihood has been normalised in such a way that 
f L(x;a)da = 1. Combining these equations gives 

Pr(a_ < a < a + |x, H) = / L(x; a) da — 1 — a — /?, (27.84) 

J a- 


and [cL_,a + ] is the Bayesian confidence interval on the value of a at the confidence 
level 1 — a — /(. As in the case of classical confidence intervals, one often quotes 
the central confidence interval, for which a = [j. Another common choice (where 
possible) is to use the two values a_ and a + satisfying (27.84), for which L(x; a_) = 
L(x;n+). 

It should be understood that a frequentist would consider the Bayesian confi- 
dence interval as an approximation to the (classical) confidence interval discussed 
in subsection 27.3.4. Conversely, a Bayesian would consider the confidence inter- 
val defined in (27.84) to be the more meaningful. In fact, the difference between 
the Bayesian and classical confidence intervals is rather subtle. The classical con- 
fidence interval is defined in such a way that if one took a large number of 
samples each of size N and constructed the confidence interval in each case then 
the proportion of cases in which the true value of a would be contained within the 
interval is 1 — a — /?. For the Bayesian confidence interval, one does not rely on the 
frequentist concept of a large number of repeated samples. Instead, its meaning is 
that, given the single sample x (and our hypothesis H for the functional form of 
the population), the probability that a lies within the interval [a_,a + ] is 1 — a — f}. 

By adopting the Bayesian viewpoint, the likelihood function L(x;n) may also 
be used to obtain an approximation to the standard error in the ML estimator; 
the approximation is given by 


(T;, = I — 


8 2 In L 


da 2 


- 1/2 


(27.85) 


Clearly, if L(x;a) were a Gaussian centred on a = d then would be its standard 
deviation. Indeed, in this case, the resulting ‘one-sigma’ limits would constitute a 
68.3% Bayesian central confidence interval. Even when L(x;n) is not Gaussian, 
however, (27.85) is often used as a measure of the standard error. 
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L(x; t) 



Figure 27.8 The likelihood function L(x; t) (normalised to unit area) for the 
sample values given in the worked example in subsection 27.5.4, and indicated 
here by short vertical lines. 


► For the sample data given in section 27.5.4, use the likelihood function to estimate the 
standard error in the ML estimator x and obtain the Bayesian 68% central confidence 
interval on x. 


We showed in (27.67) that the likelihood function in this case is given by 
L(x;t) = exp[--(xi + x 2 H bx w )]. 

X X 

where x,-, i = 1,2,..., N, denotes the sample value and N = 10. This likelihood function is 
plotted in figure 27.8, after normalising (numerically) to unit area. The short vertical lines 
in the figure indicate the sample values. We see that the likelihood function peaks at the 
ML estimate x = 3.77 that we found in subsection 27.5.4. Also, from (27.77), we have 

8 2 \aL_N 2 f 2 A \ 

dx 2 x ^ x N J 

Remembering that x = JTxj/lV, our estimate of the standard error in x is 


8 2 In L 


8 x 2 


- 1/2 




= 1.19, 


which is precisely the estimate of the standard error we obtained in subsection 27.5.4. 
It should be noted, however, that in general we would not expect the two estimates of 
standard error made by the different methods to be identical. 

In order to calculate the Bayesian 68% central confidence interval, we must determine 
the values a_ and a + that satisfy (27.84) with a = = 0.16. In this case, the calculation 

can be performed analytically but is somewhat tedious. It is trivial, however, to determine 
and a + numerically and we find the confidence interval to be [3.16,6.20], Thus we can 
quote our result with 68% central confidence limits as 


t = 3.77 


+ 2.43 

- 0 . 61 - 


By comparing this result with that given towards the end of subsection 27.5.4, we see that, 
as we might expect, the Bayesian and classical confidence intervals differ somewhat. ◄ 
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The above discussion is generalised straightforwardly to the estimation of 
several parameters a\,a 2 ,...,aM simultaneously. The elements of the covariance 
matrix of the ML estimators can be approximated by 


Vij = Cov[a,, aj] 


d 2 In L 

dcijdaj 


-l 


(27.86) 


From (27.36), we see that (at least for unbiased estimators) the expectation value 
of (27.86) is equal to the element Fjj of the Fisher matrix. 

The construction of a multi-dimensional Bayesian confidence region is also 
straightforward. For a given confidence level 1 — a (say), it is most common 
to construct the confidence region as the M-dimensional region R in a-space, 
bounded by the ‘surface’ L(x; a) = constant, for which 


/ L(x;a)rf M a = 1 — a, 

Jr 

where it is assumed that L(x; a) is normalised to unit volume. Moreover, we 
see from (27.83) that (assuming a uniform prior probability) we may obtain the 
marginal posterior distribution for any parameter a,- simply by integrating the 
likelihood function L(x;a) over the other parameters: 


P(ai\x,H) 


L(x;a)<ifli • • • <:!«,_ i da, + ] ■ ■ ■ dciM- 


Flere the integral extends over all possible values of the parameters, and again 
is it assumed that the likelihood function is normalised in such a way that 
/ L(x;a)d M a = 1. This marginal distribution can then be used as above to 
determine Bayesian confidence intervals on each a,- separately. 


► Ten independent sample values x,-, i = 1,2,..., 10, are drawn at random from a Gaussian 
distribution with unknown mean p and standard deviation a. The samples values are as 
follows ( to two decimal places ) : 

2.22 2.56 1.07 0.24 0.18 0.95 0.73 -0.79 2.09 1.81 

Find the Bayesian 95% central confidence intervals on p and a separately. 


The likelihood function in this case is 

L(x;p,i t) = (2no 2 )~ N ^ 2 exp 




■rf 


(27.87) 


Assuming uniform priors on p and a (over their natural ranges of — oo — > oo and 0 — > oo 
respectively), we may identify this likelihood function with the posterior probability, as in 
(27.83). Thus, the marginal posterior distribution on p is given by 


P{p\x,Fl ) oc 


1 


exp 




do. 
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By substituting a = 1/u (so that da = —du/u 2 ) and integrating by parts either ( N — 2)/2 
or ( N — 3)/2 times, we find 


P(p\x,H) oc [N(x-ii) 2 +Ns 2 ] (N 1)/2 , 

where we have used the fact that JT(x, — P) 2 = N(x — p) 2 + Ns 2 , x being the sample 
mean and s 2 the sample variance. We may now find the 95% central confidence interval 
by finding the values and p + for which 

/ f <- 

P(p\x,H)dp = 0.025 and / P (/(|x, H) dp = 0.025. 

-GO J fl+ 

The normalisation of the posterior distribution and the values /r_ and p + are easily 
obtained by numerical integration. Substituting in the appropriate values N = 10, x = 1.11 
and s = 1.01, we find the required confidence interval to be [0.29, 1.97]. 

To obtain a confidence interval on a, we must first obtain the corresponding marginal 
posterior distribution. From (27.87), again using the fact that JT(x,— n) 2 = N(x—ii) 2 +Ns 2 , 
this is given by 

i / Ns 2 \ r° 

P(a\x,H) oc^exp exp 

Noting that the integral of a one-dimensional Gaussian is proportional to a, we conclude 
that 

1 / Ns 2 \ 

P( ff |x,H)cc ^exp ■ 


N(x - n) 2 
2a 2 


dfi. 


The 95% central confidence interval on a can then be found in an analogous manner to 
that on /(, by solving numerically the equations 


f 


P(a\x,H) da = 0.025 and 
We find the required interval to be [0.76,2.16]. ◄ 


P(a\x,H)da = 0.025. 


27.5.6 Behaviour of ML estimators for large N 

As mentioned in subsection 27.3.6, in the large-sample limit N — * oo, the sampling 
distribution of a set of (consistent) estimators a, whether ML or not, will tend, 
in general, to a multivariate Gaussian centred on the true values a. This is a 
direct consequence of the central limit theorem. Similarly, in the limit N —> oo the 
likelihood function L(x; a) also tends towards a multivariate Gaussian but one 
centred on the ML estimate(s) a. Thus ML estimators are always asymptotically 
consistent. This limiting process was illustrated for the one-dimensional case by 
figure 27.5. 

Thus, as N becomes large, the likelihood function tends to the form 
L(x;a) = L max exp [~\Q( a, a)] , 
where Q denotes the quadratic form 

Q( a, a) = (a - a) T V _1 (a - a) 


mi 
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and the matrix V 1 is given by 

(v-% 


d 2 InL 


daidaj 


Moreover, in the limit of large N, this matrix tends to the Fisher matrix given in 
(27.36), i.e. V -1 — > F. Hence ML estimators are asymptotically minimum-variance. 

Comparison of the above results with those in subsection 27.3.6 shows that 
the large-sample limit of the likelihood function L(x;a) has the same form as the 
large-sample limit of the joint estimator sampling distribution P(a|a). The only 
difference is that P(a|a) is centred in a-space on the true values a = a whereas 
L(x;a) is centred in a-space on the ML estimates a = a. From figure 27.4 and its 
accompanying discussion, we therefore conclude that, in the large-sample limit, 
the Bayesian and classical confidence limits on the parameters coincide. 


27.5.7 Extended maximum-likelihood method 

It is sometimes the case that the number of data items N in our sample is itself a 
random variable. Such experiments are typically those in which data are collected 
for a certain period of time during which events occur at random in some way, 
as opposed to those in which a prearranged number of data items are collected. 
In particular, let us consider the case where the sample values xi,X2,...,xjv are 
drawn independently from some distribution P(xja) and the sample size N is a 
random variable described by a Poisson distribution with mean 2, i.e. N ~ Po(2). 
The likelihood function in this case is given by 

X N N 

L(x;X, a) = — (27.88) 

i=i 

and is often called the extended likelihood function. The function L(x;2, a) can 
be used as before to estimate parameter values or obtain confidence intervals. 
Two distinct cases arise in the use of the extended likelihood function, depending 
on whether the Poisson parameter X is a function of the parameters a or is an 
independent parameter. 

Let us first consider the case in which X is a function of the parameters a. From 
(27.88), we can write the extended log-likelihood function as 

1 V N 

In L = IVlnl(a) — 1(a) + ^ In P(x,)a) = —2(a) + ^ ln[2(a)P(x,ja)]. 

i=l i=l 

where we have ignored terms not depending on a. The ML estimates a of the 
parameters can then be found in the usual way, and the ML estimate of the 
Poisson parameter is simply X = 2(a). The errors on our estimators a will be, in 
general, smaller than those obtained in the usual likelihood approach, since our 
estimate includes information from the value of N as well as the sample values x,-. 
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The other possibility is that X is an independent parameter and not a function 
of the parameters a. In this case, the extended log-likelihood function is 

N 

InL = MnA — A + ^lnP(xi|a), (27.89) 

;=i 

where we have omitted terms not depending on X or a. Differentiating with 
respect to X and setting the result equal to zero, we find that the ML estimate of 
X is simply 

X = N. 

By differentiating (27.89) with respect to the parameters a,- and setting the results 
equal to zero, we obtain the usual ML estimates a,- of their values. In this case, 
however, the errors in our estimates will be larger, in general, than those in the 
standard likelihood approach, since they must include the effect of statistical 
uncertainty in the parameter X. 


27.6 The method of least squares 

The method of least squares is, in fact, just a special case of the method of 
maximum-likelihood. Nevertheless, it is so widely used as a method of parameter 
estimation that it has acquired a special name of its own. At the outset, let us 
suppose that a data sample consists of a set of pairs (x,-, y ( ), i = 1,2,..., TV. For 
example, these data might correspond to the temperature y t measured at various 
points Xj along some metal rod. 

For the moment, we will suppose that the x,- are known exactly, whereas there 
exists a measurement error (or noise ) w, on each of the values y,-. Moreover, let 
us assume that the true value of y at any position x is given by some function 
y = f(x; a) that depends on the M unknown parameters a. Then 


.Vi = f(Xi ; a) + Hi. 


Our aim is to estimate the values of the parameters a from the data sample. 

Bearing in mind the central limit theorem, let us suppose that the n,- are 
drawn from a Gaussian distribution with zero mean and no systematic bias. In 
the most general case the measurement errors n, might not be independent but 
described by an iV-dimensional multivariate Gaussian with non-trivial covariance 
matrix N, whose elements N,j = Cov[u,, nj] we assume to be known. Under these 
assumptions it follows from (26.148), that the likelihood function is 

T(x,y;a) = ( 2 7I )X/ 2 | N |i /2 ex P Hr( a >]’ 
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where the quantity denoted by x 2 is given by the quadratic form 


N 

r(a) = ^ [y,- — / (x,- ; a)] ( N_1 )ij [v j — f(Xj\ a)] = (y — f) T N- 1 (y — f). 

Uj= i (27.90) 


In the last equality, we have rewritten the expression in matrix notation by 
defining the column vector f with elements /,■ = /(x,;a). We note that in the 
(common) special case in which the measurement errors n, are independent, their 
covariance matrix takes the diagonal form N = diag((r 2 , a\, . . . , ojy), where a ,■ is 
the standard deviation of the measurement error In this case, the expression 
(27.90) for x 2 reduces to 


N 


r(a) = 

i= 1 


y,-/(x,;a) 


The least squares (LS) estimators a L s of the parameter values are defined as 
those that minimise the value of ^ 2 (a); they are usually determined by solving 
the M equations 

= 0 for i = 1, 2, . . . , M. (27.91) 

a=a L s 

Clearly, if the measurement errors n t are indeed Gaussian distributed, as assumed 
above, then the LS and ML estimators of the parameters a coincide. Because 
of its relative simplicity, the method of least squares is often applied to cases in 
which the are not Gaussian distributed. The resulting estimators ^ls are not the 
ML estimators, and the best that can be said in justification is that the method is 
an obviously sensible procedure for parameter estimation that has stood the test 
of time. 

Finally, we note that the method of least squares is easily extended to the case 
in which each measurement y ; depends on several variables, which we denote 
by x ; . For example, y,- might represent the temperature measured at the (three- 
dimensional) position x, in a room. In this case, the data is modelled by a 
function y = /(x,;a), and the remainder of the above discussion carries through 
unchanged. 


St 

da i 


27.6.1 Linear least squares 

We have so far made no restriction on the form of the function f(x; a). It so 
happens, however, that, for a model in which /(x; a) is a linear function of the 
parameters a\,a 2 ,...,aM, one can always obtain analytic expressions for the LS 
estimators £ls and their variances. The general form of this kind of model is 

M 

/(x;a) = ^a,/i,(x), (27.92) 

;=i 
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where hi(x), hjix), ...,} im(x) are some set of linearly independent fixed functions 
of x, often called the basis functions. Note that the functions /!,(x) themselves may 
be highly non-linear functions of x. The ‘linear nature of the model (27.92) refers 
only to its dependence on the parameters a ; . Furthermore, in this case, it may 
be shown that the LS estimators a, have zero bias and are minimum-variance, 
irrespective of the probability density function from which the measurement errors 
n j are drawn. 

In order to obtain analytic expressions for the LS estimators Sls, it is convenient 
to write (27.92) in the form 

M 

/(x;a) = ^R, / a 7 , (27.93) 

7=1 

where R i; = /i 7 (x,) is an element of the response matrix R of the experiment. The 
expression for given in (27.90) can then be written, in matrix notation, as 

% 2 (a) = (y — Ra) T N _1 (y — Ra). (27.94) 

The LS estimates of the parameters a are now found, as shown in (27.91), by 
differentiating (27.94) with respect to the a,- and setting the resulting expressions 
equal to zero. Denoting by V% 2 the vector with elements dx 2 /daj, we find 

V* 2 = — 2R x l'T 1 (y - Ra). (27.95) 

This can be verified by writing out the expression (27.94) in component form and 
differentiating directly. 

► Verify result (27.95) by formulating the calculation in component form. 

To make the derivation less cumbersome, let us adopt the summation convention discussed 
in section 21.1, in which it is understood that any subscript that appears exactly twice in 
any term of an expression is to be summed over all the values that a subscript in that 
position can take. Thus, writing (27.94) in component form, we have 

X 2 (a) = (y, - R ik a k )(N~%(y } - Rj,ai). 

Differentiating with respect to a p gives 
3 y 2 

= -RiA^N-ffyj - Rjta, ) + (.V,- - *ft<fc)(l'r%(-fya /p ) 

= -RifN-ffyj - R jia , ) - (y, - R.^N-fjRj,,, (27.96) 

where <5 i7 is the Kronecker delta symbol discussed in section 21.1. By swapping the indices 
i and j in the second term on the RHS of (27.96) and using the fact that the matrix NT 1 
is symmetric, we obtain 

= -2 RtpiN-fifyj - Rj k a k ) 

= —2(R T ) pi (N~ 1 )ij(y j - R jk a k ). (27.97) 

If we denote the vector with components df/da p , p = 1,2 by V/ 2 and write the 
RHS of (27.97) in matrix notation, we recover the result (27.95). ◄ 
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Setting the expression (27.95) equal to zero at a = a, we find 
— 2R T N~ 1 y + 2R r [\T 1 Ra = 0. 

Provided the matrix R T N _1 R is not singular, we may solve this equation for a to 
obtain 

a = (R T N _1 R) -1 R T N -1 y = Sy, (27.98) 

thus defining the MxN matrix S. It follows that the LS estimates a,-, i = 1, 2 
are linear functions of the original measurements yj, j = 1,2,..., N. Moreover, 
using the error propagation formula (26.141) derived in subsection 26.12.3, we 
find that the covariance matrix of the estimators h, is given by 

V = Cov[a„ a,j\ = SNS T = (R T N- 1 Rp 1 . (27.99) 

The two equations (27.98) and (27.99) contain the complete method of least 
squares. In particular, we note that, if one calculates the LS estimates using 
(27.98) then one has already obtained their covariance matrix (27.99). 

►Prore result (27.99). 

Using the definition of S given in (27.98), the covariance matrix (27.99) becomes 
V = SNS t 

= [(R T N- 1 Rr 1 R T N- 1 ]N[(R T N- 1 R)- 1 R T N- 1 ] T . 

Using the result (AB ■ ■ ■ C) T = C 7 ■ ■ ■ B T A T for the transpose of a product of matrices and 
noting that, for any non-singular matrix, ( A 1 ) T = (A 7 )” 1 we find 

v = (r t n- 1 r)- 1 r t n- 1 n(n t )- 1 r[(r t n- 1 r ) t ]- 1 

= (R T i\r 1 Rr 1 R T i\r 1 R(R T i\r 1 Rr 1 
= (R t N- 1 R)- 1 , 

where we have also used the fact that N is symmetric and so N 7 = N. ◄ 

It is worth noting that one may also write the elements of the (inverse) 
covariance matrix as 

K = 1 f 

1 2 \ daidaj 

which is the same as the Fisher matrix (27.36) in cases where the measurement 
errors are Gaussian distributed (and so the log-likelihood is InL = — j 2 / 2). This 
proves, at least for this case, our earlier statement that the LS estimators are 
minimum- variance. In fact, since f(x; a) is linear in the parameters a, one can 
write x 2 exactly as 



which is quadratic in the parameters a,-. Lienee the likelihood function L oc 
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y 



Figure 27.9 A set of data points with error bars indicating the uncertainty 
a = 0.5 on the y-values. The straight line is y = rhx + c, where m and c are 
the least squares estimates of the slope and intercept. 


exp(-Z 2 /2) is Gaussian. From the discussions of sections 27.3.6 and 27.5.6, it 
follows that the ‘surfaces’ % 2 (a) = c > where c is a constant, bound ellipsoidal 
confidence regions for the parameters a t . The relationship between the value of 
the constant c and the confidence level is given by (27.39). 


►fin experiment produces the following data sample pairs (x* ,yf: 

X; : 1.85 2.72 2.81 3.06 3.42 3.76 4.31 4.47 4.64 4.99 

y,: 2.26 3.10 3.80 4.11 4.74 4.31 5.24 4.03 5.69 6.57 

where the Xi-values are known exactly but each y r value is measured only to an accuracy 

of a = 0.5. Assuming the underlying model for the data to be a straight line y = rax + c, 
find the LS estimates of the slope m and intercept c and quote the standard error on each 
estimate. 


The data are plotted in figure 27.9, together with error bars indicating the uncertainty in 
the y, -values. Our model of the data is a straight line, and so we have 

/(x;c, m) = c + mx. 


In the language of (27.92), our basis functions are hfix) = 1 and h 2 (x) = x and our model 
parameters are ai = c and a 2 = m. From (27.93) the elements of the response matrix are 
Rij = hj(xj), so that 


R = 


(\ 


X] \ 

x 2 


\ 1 X N J 


(27.100) 


where x, are the data values and N = 10 in our case. Further, since the standard deviation 
on each measurement error is a, we have N = a 2 \, where I is the N x N identity matrix. 
Because of this simple form for N, the expression (27.98) for the LS estimates reduces to 

a = ffWRF^RTy = (R T R) _1 R T y. (27.101) 

<T- 


Note that we cannot expand the inverse in the last line, since R itself is not square and 
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hence does not possess an inverse. Inserting the form for R in (27.100) into the expression 
(27.101), we find 


c 

m 


( E, i 

£,*, V 

( Eii'f ^ 

V 

) 


i 

-( ? 

x \[Ny 

Nix 2 - x 2 

) V -x 

1 ) \ Nxy 


We thus obtain the LS estimates 


A xy — x y A x 2 v — x xy 

m = -= and c = ~^= — = v — mx, 

X 2 - X 2 X 2 -X 2 


(27.102) 


where the last expression for c shows that the best-fit line passes through the ‘centre 
of mass' (x,y) of the data sample. To find the standard errors on our results, we must 
calculate the covariance matrix of the estimators. This is given by (27.99), which in our 
case reduces to 

v-' 2(RlRr '-S5& ii)(4 -?)• ' 27103 » 


The standard error on each estimator is simply the positive square root of the corresponding 
diagonal element, i.e. erg = ^JV u and on, = yJV 22 , and the covariance of the estimators m 
and c is given by Cov[c, m] = V 12 = V 21 . Inserting the data sample averages and moments 
into (27.102) and (27.103), we find 


c = c + erg. = 0.40 + 0.62 and m = m + a,-„ = 1.11 +0.17. 


The ‘best-fit’ straight line y = mx + c is plotted in figure 27.9. For comparison, the true 
values used to create the data were m = 1 and c = 1 . ◄ 


The extension to the fitting the data to a higher-order polynomial, such as 
/(x;a) = fli +a 2 X + fl 3 .x 2 , is obvious. Nevertheless, as the order of the polynomial 
increases the matrix inversions become rather complicated. Indeed, even when the 
matrices are inverted numerically, the inversion is prone to numerical instabilities. 
A better approach is to replace the basis functions h m (x) = x m , m — 1,2,..., M, 
with a set of polynomials that are ‘orthogonal over the data’, i.e. such that 

N 

hi(xi)h m (xi) = 0 for l ^ m. 

1=1 

Such a set of polynomial basis functions can always be found by using the Gram- 
Schmidt orthogonalisation procedure presented in section 17.1. The details of this 
approach are beyond the scope of our discussion but we note that, in this case, 
the matrix R T R is diagonal and may be inverted easily. 


27.6.2 Non-linear least squares 

If the function /(x; a) is not linear in the parameters a then, in general, it is 
not possible to obtain an explicit expression for the LS estimates a. Instead, one 
must use an iterative (numerical) procedure, which we now outline. In practice, 
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however, such problems are best solved using one of the many commercially 
available software packages. 

One begins by making a first guess a 0 for the values of the parameters. At this 
point in parameter space, the components of the gradient Vy 2 will, not be equal 
to zero, in general (unless one makes a very lucky guess!). Thus, for at least some 
values of i, we have 


dr 

dcij 



fO. 


Our aim is to find a small increment (5a in the values of the parameters, such that 

= 0 for all i. (27.104) 

a=a°+c)a 

If our first guess a 0 were sufficiently close to the true (local) minimum of y 2 , 
we could find the required increment (5a by expanding the LHS of (27.104) as a 
Taylor series about a = a 0 , keeping only the zeroth-order and first-order terms: 


dr 

dcij 


d* 2 


1 ^ 
X 

1 bJ 

+ y d 2 y 2 

dcij 

a=a°+<5a 

dcij 

a=a° p' daidaj 


(27.105) 


Setting this expression to zero, we find that the increments 8a j may be found by 
solving the set of M linear equations 


M 


E 


d 2 y 2 

ddjdcij 


a=a 


o 


dcij = 



It most cases, however, our first guess a 0 will not be sufficiently close to the true 
minimum for (27.105) to be an accurate approximation, and consequently (27.104) 
will not be satisfied. In this case, a 1 = a 0 + (5a is (hopefully) an improved guess 
at the parameter values; the whole process is then repeated until convergence is 
achieved. 

It is worth noting that, when one is estimating several parameters a, the 
function y 2 ( a) may be very complicated. In particular, it may possess numerous 
local extrema. The procedure outlined above will converge to the local extremum 
‘nearest'' to the first guess a 0 . Since, in fact, we are interested only in the local 
minimum that has the absolute lowest value of ^ 2 (a), it is clear that a large part 
of solving the problem is to make a ‘good’ first guess. 


27.7 Hypothesis testing 

So far we have concentrated on using a data sample to obtain a number or a set 
of numbers. These numbers may be estimated values for the moments or central 
moments of the population from which the sample was drawn or, more generally, 
the values of some parameters a in an assumed model for the data. Sometimes, 
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however, one wishes to use the data to give a ‘yes’ or ‘no’ answer to a particular 
question. For example, one might wish to know whether some assumed model 
does, in fact, provide a good fit to the data, or whether two parameters have the 
same value. 


27.7.1 Simple and composite hypotheses 

In order to use data to answer questions of this sort, the question must be 
posed precisely. This is done by first asserting that some hypothesis is true. 
The hypothesis under consideration is traditionally called the null hypothesis 
and is denoted by Hq. In particular, this usually specifies some form P(x|Ho) 
for the probability density function from which the data x are drawn. If the 
hypothesis determines the PDF uniquely, then it is said to be a simple hypothesis. 
If, however, the hypothesis determines the functional form of the PDF but not the 
values of certain parameters a on which it depends then it is called a composite 
hypothesis. 

One decides whether to accept or reject the null hypothesis Ho by performing 
some statistical test, as described below in subsection 27.7.2. In fact, formally 
one uses a statistical test to decide between the null hypothesis Ho and the 
alternative hypothesis H\. We define the latter to be the complement H 0 of the 
null hypothesis within some restricted hypothesis space known ( or assumed ) in 
advance. Hence, rejection of Hq implies acceptance of Hi, and vice versa. 

As an example, let us consider the case in which a sample x is drawn from a 
Gaussian distribution with a known variance er 2 but with an unknown mean p. 
If one adopts the null hypothesis Hq that p = 0 which we write asHo : p = 0, 
then the corresponding alternative hypothesis must be Hi : p i= 0. Note that, 
in this case, Hq is a simple hypothesis whereas Hi is a composite hypothesis. 
If, however, one adopted the null hypothesis Hq : p < 0 then the alternative 
hypothesis would be Hi : p > 0, so that both Ho and Hi would be composite 
hypotheses. Very occasionally both Ho and H\ will be simple hypotheses. In our 
illustration, this would occur, for example, if one knew in advance that the mean 
p of the Gaussian distribution were equal to either zero or unity. In this case, if 
one adopted the null hypothesis Ho : p = 0 then the alternative hypothesis would 
be Hi : p = 1. 


27.7.2 Statistical tests 

In our discussion of hypothesis testing we will restrict our attention to cases in 
which the null hypothesis Ho is simple (see above). We begin by constructing a 
test statistic t(x) from the data sample. Although, in general, the test statistic need 
not be just a (scalar) number, and could be a multi-dimensional (vector) quantity, 
we will restrict our attention to the former case. Like any statistic, f(x) will be a 
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P(t\H 0 ) 






Figure 27.10 The sampling distributions P(t\H 0 ) and P(t\Hi) of a test statistic 
f. The shaded areas indicate the (one-tailed) regions for which Pr(f > t cr i t |Ho) = 
a and Pr(t < t cr i t |Hi) = P respectively. 


random variable. Moreover, given the simple null hypothesis Ho concerning the 
PDF from which the sample was drawn, we may determine (in principle) the 
sampling distribution P(t\H 0 ) of the test statistic. A typical example of such a 
sampling distribution is shown in figure 27.10. One defines for t a rejection region 
containing some fraction a of the total probability. For example, the (one-tailed) 
rejection region could consist of values of f greater than some value t cnt , for 
which 

PCO 

Pr(f > f crit |Ho) = / P(t\Ho)dt = a; (27.106) 

J fcrit 

this is indicated by the shaded region in the upper half of figure 27.10. Equally, 
a (one-tailed) rejection region could consist of values of t less than some value 
Grit. Alternatively, one could define a (two-tailed) rejection region by two values 
t\ and t 2 such that Pr(H < f < t 2 \H 0 ) = a. In all cases, if the observed value of t 
lies in the rejection region then Ho is rejected at significance level or, otherwise H 0 
is accepted at this same level. 

It is clear that there is a probability a of rejecting the null hypothesis Hq 
even if it is true. This is called an error of the first kind. Conversely, an error 
of the second kind occurs when the hypothesis Ho is accepted even though it is 
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false (in which case Hi is true). The probability / (say) that such an error will 
occur is, in general, difficult to calculate, since the alternative hypothesis H\ is 
often composite. Nevertheless, in the case where H\ is a simple hypothesis, it is 
straightforward (in principle) to calculate /. Denoting the corresponding sampling 
distribution of f by P(t\H\), the probability ji is the integral of P(t\H\) over the 
complement of the rejection region, called the acceptance region. For example, in 
the case corresponding to (27.106) this probability is given by 

/ ^crit 

P(t\Hi)dt. 

-00 

This is illustrated in figure 27.10. The quantity 1 — is called the power of the 
statistical test to reject the wrong hypothesis. 


27.7.3 The Neyman-Pearson test 

In the case where Hq and Hi are both simple hypotheses, the Neyman-Pearson 
lemma (which we shall not prove) allows one to determine the ‘best’ rejection 
region and test statistic to use. 

We consider first the choice of rejection region. Even in the general case, in 
which the test statistic t is a multi-dimensional (vector) quantity, the Neyman- 
Pearson lemma states that, for a given significance level a, the rejection region for 
Hq giving the highest power for the test is the region of f-space for which 


pm o) 

Pam 


(27.107) 


where c is some constant determined by the required significance level. 

In the case where the test statistic f is a simple scalar quantity, the Neyman- 
Pearson lemma is also useful in deciding which such statistic is the ‘best’ in 
the sense of having the maximum power for a given significance level a. From 
(27.107), we can see that the best statistic is given by the likelihood ratio 


f(x) = 


P(x\H 0 ) 

P(x\HiY 


(27.108) 


and that the corresponding rejection region for Ho is given by t < f cr j t . In fact, 
it is clear that any statistic u = /(f) will be equally good, provided that /(f) is a 
monotonically increasing function of t. The rejection region is then u < /(f cr it)- 
Alternatively, one may use any test statistic v = g(f) where g(t) is a monotonically 
decreasing function of f; in this case the rejection region becomes v > g(f cr it)- To 
construct such statistics, however, one must know P(x\Ho) and P(x\Hi) explicitly, 
and such cases are rare. 
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► Ten independent sample values x„ i = 1,2,..., 10, are drawn at random from a Gaussian 
distribution with standard deviation a = 1. The mean p of the distribution is known to 
equal either zero or unity. The sample values are as follows: 

2.22 2.56 1.07 0.24 0.18 0.95 0.73 -0.79 2.09 1.81 

Test the null hypothesis Ho : p = 0 at the 10% significance level. 


The restricted nature of the hypothesis space means that our null and alternative hypotheses 
are Ho : p = 0 and H i : p = 1 respectively. Since H 0 and H i are both simple hypotheses, 
the best test statistic is given by the likelihood ratio (27.108). Thus, denoting the means 
by po and p\, we have 

= exp j-| E,(*. - Po) 2 ] = exp j-| giC*? - 2p oXi + #j)] 
exp [-| Ei(*i - hi) 2 ] exp [-| E,(*? - 2p iXi + pj)] 

= exp [(,io - x i - i N (ho - /'ll] • 

Inserting the values, po = 0, and p± = 1, yields t = exp(— Nx + | TV), where x is the 
sample mean. Since — In f is a monotonically decreasing function of f, however, we may 
equivalently use as our test statistic 

1 , , 

v = In f + 4 = x, 

N 2 

where we have divided by the sample size N and added ^ for convenience. Thus we 
may take the sample mean as our test statistic. From (27.13), we know that the sampling 
distribution of the sample mean under our null hypothesis H 0 is the Gaussian distribution 
N(po,o 2 /N), where po = 0, a 2 = 1 and N = 10. Thus x ~ JV(0,0.1). 

Since x is a monotonically decreasing function of f, our best rejection region for a given 
significance a is x > x cr it, where x cr j, depends on a. Thus, in our case, x C nt is given by 

« = 1 - O ( X ' cnt ~ / '° ) = 1 - ®(10.x crit ), 

where ®(z) is the cumulative distribution function for the standard Gaussian. For a 10% 
significance level we have a = 0.1 and, from table 26.3 in subsection 26.9.1, we find 
x cr it = 0.128. Thus the rejection region on x is 

x > 0.128. 

From the sample, we deduce that x = 1.11, and so we can clearly reject the null hypothesis 
H 0 : p = 0 at the 10% significance level It can, in fact, be rejected at a much higher 
significance level. As revealed on p. 1081, the data was generated using /<=!.◄ 


27.7.4 The generalised likelihood-ratio test 

If the null hypothesis Ho or the alternative hypothesis H i (or both) is composite 
then the corresponding distributions P(x\Hq ) and P(x\Hi ) are not uniquely de- 
termined, in general, and so we cannot use the Neyman-Pearson lemma to obtain 
the ‘best’ test statistic t. Nevertheless, in many cases, there still exists a general 
procedure for constructing a test statistic t which has useful properties and which 
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reduces to the Neyman-Pearson statistic (27.108) in the special case where Hq 
and Hi are both simple hypotheses. 

Consider the quite general, and commonly occurring, case in which the 
data sample x is drawn from a population P(x|a) with a known (or as- 
sumed) functional form but depends on the unknown values of some parameters 
ai,a 2 ,...,aM- Moreover, suppose we wish to test the null hypothesis H 0 that 
the parameter values a lie in some subspace S of the full parameter space 
A. In other words, on the basis of the sample x it is desired to test the 
null hypothesis Ho : (ai,a 2 ,...,a M lies in 5) against the alternative hypothesis 
Hi : (ai, a 2 ,...,a M lies in S), where S is A — S. 

Since the functional form of the population is known, we may write down the 
likelihood function L(x; a) for the sample. Ordinarily, the likelihood will have 
a maximum as the parameters a are varied over the entire parameter space A. 
This is the usual maximum-likelihood estimate of the parameter values, which 
we denote by a. If, however, the parameter values are allowed to vary only over 
the subspace S then the likelihood function will be maximised at the point a$, 
which may or may not coincide with the global maximum a. Now, let us take as 
our test statistic the generalised likelihood ratio 


f(x) = 


P(x;a 5 ) 
L(x;a) ’ 


(27.109) 


where L(x;a 5 ) is the maximum value of the likelihood function in the subspace 
S and L(x; a) is its maximum value in the entire parameter space A. It is clear 
that t is a function of the sample values only and must lie between 0 and 1. 

We will concentrate on the special case where H 0 is the simple hypothesis 
Hq : a = ao. The subspace S then consists of only the single point ao. Thus 
(27.109) becomes 


t(x) = 


F(x;a 0 ) 
L(x;a) ’ 


(27.110) 


and the sampling distribution P(t\Ho) can be determined (in principle). As in the 
previous subsection, the best rejection region for a given significance a is simply 
t < fcrit, where the value r cnt depends on a. Moreover, as before, an equivalent 
procedure is to use as a test statistic u = /(f), where /(f) is any monotonically 
increasing function of f; the corresponding rejection region is then u < /(f cr it)- 
Similarly, one may use a test statistic v = g(f), where g(t) is any monotonically 
decreasing function of f; the rejection region then becomes v > g(f cr j t ). Finally, 
we note that if Hi is also a simple hypothesis H i : a = ai, then (27.110) reduces 
to the Neyman-Pearson test statistic (27.108). 
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► Ten independent sample values x, (i = 1,2,..., 10) are drawn at random from a Gaussian 
distribution with standard deviation a = 1. The sample values are as follows: 

2.22 2.56 1.07 0.24 0.18 0.95 0.73 -0.79 2.09 1.81 

Test the null hypothesis Ho : p = 0 at the 10% significance level. 


We must test the (simple) null hypothesis H 0 : p = 0 against the (composite) alternative 
hypothesis H t : p f 0. Thus, the subspace S is the single point p = 0, whereas A is the 
entire /(-axis. The likelihood function is 

= (2n) N n- exp - b) 2 ] . 

which has its global maximum at p = x. The test statistic f is then given by 


t(x) 


L(x;0) 

L(x;x) 


ex P [~ 

exp [-j>I,(.v, -.v) r 


= exp (— \Nx 2 ) . 


It is in fact more convenient to consider the test statistic 


v = — 2 In t = Nx 2 . 


Since —2 In t is a monotonically decreasing function of t, the rejection region now becomes 
v > v cr it, where 


P(v\Hq) dv = a. 


(27.111) 


a being the significance level of the test. Thus it only remains to determine the sampling 
distribution P(v\Ho). Under the null hypothesis Ho, we expect x to be Gaussian distributed, 
with mean zero and variance 1 /N. Thus, from subsection 26.9.4, v will follow a chi-squared 
distribution of order 1. Substituting the appropriate form for P(v\Hq) in (27.111) and 
setting a. = 0.1, we find by numerical integration (or from tables of the cumulative chi- 
squared distribution) that r cr i t = Nx 2 lk = 2.71. Since N = 10, the rejection region on x at 
the 10% significance level is thus 


.x < —0.52 and x > 0.52. 


As noted before, for this sample x = 1.11, and so we may reject the null hypothesis 
H 0 : p = 0 at the 10% significance level. ◄ 


The above example illustrates the general situation that if the maximum- 
likelihood estimates a of the parameters fall in or near the subspace S then the 
sample will be considered consistent with Hq and the value of f will be near 
unity. If a is distant from S then the sample will not be in accord with Ho and 
ordinarily t will have a small (positive) value. 

It is clear that in order to prescribe the rejection region for f, or for a related 
statistic u or v, it is necessary to know the sampling distribution P(t\Ho). If Hq 
is simple then one can in principle determine P(t\Ho), although this may prove 
difficult in practice. Moreover, if Ho is composite, then it may not be possible 
to obtain P(t\Ho), even in principle. Nevertheless, a useful approximate form for 
P(t\Ho) exists in the large-sample limit. Consider the null hypothesis 

Hq : («i = a°, ci2 = a \, . . . , a R = a° R ), where R < M 
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and the n° are fixed numbers. (In fact, we may fix the values of any subset 
containing R of the M parameters.) If Hq is true then it follows from our 
discussion in subsection 27.5.6 (although we shall not prove it) that, when the 
sample size N is large, the quantity —2 Inf follows approximately a chi-squared 
distribution of order R. 


27.7.5 Student’s t-test 


Student’s f-test is just a special case of the generalised likelihood ratio test applied 
to a sample xi,X 2 ,...,xn drawn independently from a Gaussian distribution for 
which both the mean h and variance a 2 are unknown, and for which one wishes 
to distinguish between the hypotheses 

Hq : H = Ho, 0 < a 2 < oo, and Hi : /< =f= Ho , 0 < tr 2 < oo, 


where Ho is a given number. Here, the parameter space A is the half-plane 
—oo < /( < oo, 0 < cr 2 < oo, whereas the subspace S characterised by the null 
hypothesis Hq is the line /( = Ho, 0 < tr 2 < oo. 

The likelihood function for this situation is given by 

L(W!,- (2J¥75“ P 

On the one hand, as shown in subsection 27.5.1, the values of h and a 1 that 
maximise L in A are /( = x and a 2 = s 2 , where x is the sample mean and s 2 is 
the sample variance. On the other hand, to maximise L in the subspace S we set 
/< = Ho, and the only remaining parameter is cr 2 ; the value of a 1 that maximises 
L is then easily found to be 

ff2 = jf Y, iXi ~ MY 1 ■ 

i= 1 


2a 2 


To retain, in due course, the standard notation for Student’s t- test, in this section 
we will denote the generalised likelihood ratio by X (rather than f); it is thus 
given by 


2(x) 


L(x;/t 0 ,(t 2 ) 

L(x;x, s 2 ) 

[{In IN) E,.(x,- - Ho) 2 r N/2 exp(— iV/2) 
[(271 /N) X)i(x,- - x ) 2 ]- n/2 exp(— iV/2) 


~ x ) 2 
Ei(-Xi - Ho) 2 


N/2 


(27.112) 


Normally, our next step would be to find the sampling distribution of X under 
the assumption that H 0 were true. It is more conventional, however, to work in 
terms of a related test statistic f, which was first devised by William Gossett, who 
wrote under the pen name of ‘Student’. 
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The sum of squares in the denominator of (27.112) may be put into the form 


Ei(*i - do) 2 = N{x - Ho) 2 + E;(*i - - x ) 2 - 


Thus, on dividing the numerator and denominator in (27.112) by E i( x i~ x ) 2 and 
rearranging, the generalised likelihood ratio X can be written 


2=1 


TV — 1 


-N/2 


where we have defined the new variable 


x — no 
s/y/N^ r 


(27.113) 


Since t 2 is a monotonically decreasing function of X, the corresponding rejection 
region is f 2 > c, where c is a positive constant depending on the required 
significance level a. It is conventional, however, to use t itself as our test statistic, 
in which case our rejection region becomes two-tailed and is given by 


f < -fcrit and f > fcrib 


(27.114) 


where f cr j t is the positive square root of the constant c. 

The definition (27.113) and the rejection region (27.114) form the basis of 
Student’s t- test. It only remains to determine the sampling distribution P(t\Ho). 
At the outset, it is worth noting that if we write the expression (27.113) for f 
in terms of the standard estimator a = sjNs 1 /{N — 1) of the standard deviation 
then we obtain 


-x — Ho 
o/yfN' 


(27.115) 


If, in fact, we knew the true value of a and used it in this expression for t then 
it is clear from our discussion in section 27.3 that t would follow a Gaussian 
distribution with mean 0 and variance 1, i.e. t ~ N( 0, 1). When a is not known, 
however, we have to use our estimate a in (27.115), with the result that t is 
no longer distributed as the standard Gaussian. As one might expect from 
the central limit theorem, however, the distribution of t does tend towards the 
standard Gaussian for large values of N. 

As noted earlier, the exact distribution of t, valid for any value of N, was first 
discovered by William Gossett. From (27.35), if the hypothesis Ho is true then the 
joint sampling distribution of .x and s is given by 


P(x,s\H 0 ) = Cs N 2 exp I — 


Ns 2 
2<7 2 


exp 


N(x — h) 


21 


2er 2 


(27.116) 


where C is a normalisation constant. We can use this result to obtain the joint 
sampling distribution of s and t by demanding that 


P(x,s\Hq) dx ds = P{t,s\Ho)dt ds. 
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Using (27.113) to substitute for x — in (27.116), and noting that dx = 
(s/^/N — 1 )dt, we find 


P(x,s\H 0 )dxds = As n 1 exp 


Ns 2 
2<r 2 



r 

TV — 1 


dt ds, 


where A is another normalisation constant. In order to obtain the sampling 
distribution of f alone, we must integrate P(t, s\Hq) with respect to s over its 
allowed range, from 0 to oo. Thus, the required distribution of t alone is given by 


/•CO 

P(t\H 0 )= / P(t,s\H 0 )ds 
Jo 



exp 


Ns 2 
2(7 2 



ds. 

(27.117) 


To perform this integration, we make the change of variable y = s{l + [t 2 /{N — 
1)]}V 2 , which on substitution into (27.117) yields 

Since the integral over y does not depend on f, it is simply a constant. We thus 
find that that the sampling distribution of the variable t is 


P(t\H 0 ) = 


1 r (l N ) 

^/(N — l)n T (j(iV — 1)) 



f 2 

TV — 1 


—N/2 


(27.118) 


where we have used the condition J^ P(t\Ho)dt = 1 to determine the normali- 
sation constant (see exercise 27.18). 

The distribution (27.118) is called Student’s t-distribution with N — 1 degrees of 
freedom. A plot of Student’s f-distribution is shown in figure 27.11 for various 
values of N. For comparison, we also plot the standard Gaussian distribution, 
to which the t-distribution tends for large N. As is clear from the figure, the 
t-distribution is symmetric about f = 0. In table 27.2 we list some critical points 
of the cumulative probability function C„(f) of the f-distribution, which is defined 
by 

C„(f)= / P(t'\H 0 )dt', 

where n = N — 1 is the number of degrees of freedom. Clearly, C„(t) is analogous 
to the cumulative probability function <t(z) of the Gaussian distribution, discussed 
in subsection 26.9.1. For comparison purposes, we also list the critical points of 
<F(z), which corresponds to the f-distribution for TV = oo. 
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P(t\H 0 ) 



Figure 27.11 Student’s /-distribution for various values of N. The broken 
curve shows the standard Gaussian distribution for comparison. 


► Ten independent sample values x,-, i = 1,2,..., 10, are drawn at random from a Gaussian 
distribution with unknown mean p and unknown standard deviation a. The sample values 
are as follows: 

2.22 2.56 1.07 0.24 0.18 0.95 0.73 -0.79 2.09 1.81 

Test the null hypothesis, Ho : p = 0 at the 10% significance level. 


For our null hypothesis p 0 = 0. Since for this sample x = 1.11, s = 1.01 and N = 10, it 
follows from (27.113) that 


x 

s/Viv^T 


3.33. 


The rejection region for t is given by (27.114) where t cr i t is such that 


Cx-ptcrit) = 1 - a/2, 

and a is the required significance of the test. In our case a. = 0.1 and N = 10, and from 
table 27.2 we find t cr j t = 1.83. Thus our rejection region for Ho at the 10% significance 
level is 


t < —1.83 and t > 1.83. 

For our sample t = 3.30 and so we can clearly reject the null hypothesis H 0 : p = 0 at this 
level. ◄ 


It is worth noting the connection between the f-test and the classical confidence 
interval on the mean //. The central confidence interval on p at the confidence 
level 1 — a, is the set of values for which 


fcrit < 


X — p 

Sfy/N^i 


< ^crit? 
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c„(t ) 

0.5 

0.6 

0.7 

0.8 

0.9 

0.950 

0.975 

0.990 

0.995 

0.999 

n = 1 

0.00 

0.33 

0.73 

1.38 

3.08 

6.31 

12.7 

31.8 

63.7 

318.3 

2 

0.00 

0.29 

0.62 

1.06 

1.89 

2.92 

4.30 

6.97 

9.93 

22.3 

3 

0.00 

0.28 

0.58 

0.98 

1.64 

2.35 

3.18 

4.54 

5.84 

10.2 

4 

0.00 

0.27 

0.57 

0.94 

1.53 

2.13 

2.78 

3.75 

4.60 

7.17 

5 

0.00 

0.27 

0.56 

0.92 

1.48 

2.02 

2.57 

3.37 

4.03 

5.89 

6 

0.00 

0.27 

0.55 

0.91 

1.44 

1.94 

2.45 

3.14 

3.71 

5.21 

7 

0.00 

0.26 

0.55 

0.90 

1.42 

1.90 

2.37 

3.00 

3.50 

4.79 

8 

0.00 

0.26 

0.55 

0.89 

1.40 

1.86 

2.31 

2.90 

3.36 

4.50 

9 

0.00 

0.26 

0.54 

0.88 

1.38 

1.83 

2.26 

2.82 

3.25 

4.30 

10 

0.00 

0.26 

0.54 

0.88 

1.37 

1.81 

2.23 

2.76 

3.17 

4.14 

11 

0.00 

0.26 

0.54 

0.88 

1.36 

1.80 

2.20 

2.72 

3.11 

4.03 

12 

0.00 

0.26 

0.54 

0.87 

1.36 

1.78 

2.18 

2.68 

3.06 

3.93 

13 

0.00 

0.26 

0.54 

0.87 

1.35 

1.77 

2.16 

2.65 

3.01 

3.85 

14 

0.00 

0.26 

0.54 

0.87 

1.35 

1.76 

2.15 

2.62 

2.98 

3.79 

15 

0.00 

0.26 

0.54 

0.87 

1.34 

1.75 

2.13 

2.60 

2.95 

3.73 

16 

0.00 

0.26 

0.54 

0.87 

1.34 

1.75 

2.12 

2.58 

2.92 

3.69 

17 

0.00 

0.26 

0.53 

0.86 

1.33 

1.74 

2.11 

2.57 

2.90 

3.65 

18 

0.00 

0.26 

0.53 

0.86 

1.33 

1.73 

2.10 

2.55 

2.88 

3.61 

19 

0.00 

0.26 

0.53 

0.86 

1.33 

1.73 

2.09 

2.54 

2.86 

3.58 

20 

0.00 

0.26 

0.53 

0.86 

1.33 

1.73 

2.09 

2.53 

2.85 

3.55 

25 

0.00 

0.26 

0.53 

0.86 

1.32 

1.71 

2.06 

2.49 

2.79 

3.46 

30 

0.00 

0.26 

0.53 

0.85 

1.31 

1.70 

2.04 

2.46 

2.75 

3.39 

40 

0.00 

0.26 

0.53 

0.85 

1.30 

1.68 

2.02 

2.42 

2.70 

3.31 

50 

0.00 

0.26 

0.53 

0.85 

1.30 

1.68 

2.01 

2.40 

2.68 

3.26 

100 

0.00 

0.25 

0.53 

0.85 

1.29 

1.66 

1.98 

2.37 

2.63 

3.17 

200 

0.00 

0.25 

0.53 

0.84 

1.29 

1.65 

1.97 

2.35 

2.60 

3.13 

00 

0.00 

0.25 

0.52 

0.84 

1.28 

1.65 

1.96 

2.33 

2.58 

3.09 


Table 27.2 The confidence limits t of the cumulative probability function 
C„(t) for Student’s t-distribution with n degrees of freedom. For example, 
C 5 (0.92) = 0.8. The n = oo row is also the corresponding result for the 
standard Gaussian distribution. 


where f cr j t satisfies Cm-\ (f CT it) = a/2. Thus the required confidence interval is 

t critS _ , t crit^ 

X . < LI < X , . 

V^T VaTTI 

Hence, in the above example, the 90% classical central confidence interval on /t 
is 


0.49 <n< 1.73. 

The f-distribution may also be used to compare different samples from Gaussian 
distributions. In particular, let us consider the case where we have two independent 
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samples of sizes N 1 and N 2 , drawn respectively from Gaussian distributions with 
a common variance a 2 but with possibly different means /i\ and pt 2 . One the basis 
of the samples, one wishes to distinguish between the hypotheses 

Hq : f_i\ = fi 2 , 0 < a 2 < 00 and H 1 : )i\ pi 2 , 0 < er < 00 . 


In other words, we wish to test the null hypothesis that the samples are drawn 
from populations having the same mean. Suppose that the measured sample 
means and standard deviations are x\, x 2 and si, S 2 respectively. In an analogous 
way to that presented above, one may show that the generalised likelihood ratio 
can be written as 

t 2 x -(JV!+JV 2 )/2 

Ni + N 2 — 2 ) 



In this case, the variable t is given by 


W — CO ( N 1 N 2 \ ^ 
a \Ni +N 2 ) ’ 


(27.119) 


where w = xi — x 2 , co = Hi — n 2 and 


a — 


N isf + N 2 s\ 


a 1/2 


Ni + N 2 - 2 


It is straightforward (albeit with complicated algebra) to show that the variable t 
in (27.119) follows Student’s r-distribution with N\ + N 2 — 2 degrees of freedom, 
and so we may use an appropriate form of Student’s f-test to investigate the null 
hypothesis H 0 : fi 1 = n 2 ( or equivalently H 0 : m = 0). As above, the f-test can be 
used to place a confidence interval on co = fi\ — n 2 . 


► Suppose that two classes of students take the same mathematics examination and the 
following percentage marks are obtained: 

Class 1 : 66 62 34 55 77 80 55 60 69 47 50 

Class 2: 64 90 76 56 81 72 70 

Assuming that the two sets of examinations marks are drawn from Gaussian distributions 
with a common variance, test the hypothesis Hq : pi = p 2 at the 5% significance level. Use 
your result to obtain the 95% classical central confidence interval on a> = p\ — p 2 - 


We begin by calculating the mean and standard deviation of each sample. The number of 
values in each sample is TVi = 11 and N 2 = l respectively, and we find 

x\ = 59.5, si = 12.8 and x 2 = 72.7, s 2 = 10.3, 

leading to w = xi — x 2 = —13.2 and <7 = 12.6. Setting co = 0 in (27.119), we thus find 
t = -2.17. 

The rejection region for Hq is given by (27.114), where f cnt satisfies 

Cjv 1+ jv 2 -2(fcrit) = 1 — «/2, (27.120) 
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where a is the required significance level of the test. In our case we set oc = 0.05, and from 
table 27.2 with n = 16 we find that f crit = 2.12. The rejection region is therefore 

t < — 2.12 and f > 2.12. 


Since t = —2.17 for our samples, we can reject the null hypothesis Ho : p\ = pi, although 
only by a small margin. (Indeed, it is easily shown that one cannot reject H 0 at the 2% 
significance level). The 95% central confidence interval on co = p\ — p 2 is given by 


W - fftcit 


(Ni+N, 

V Ni N 2 



co 


< w + fft crit 


fNi+Ni 

V NiN 2 


1/2 


where f crit is given by (27.120). Thus, we find 


-26.1 <co< -0.28, 


which, as expected, does not (quite) contain co = 0. ◄ 


In order to apply Student’s f-test in the above example, we had to make the 
assumption that the samples were drawn from Gaussian distributions possessing a 
common variance, which is clearly unjustified a priori. We can, however, perform 
another test on the data to investigate whether the additional hypothesis a\ — a\ 
is reasonable; this test is discussed in the next subsection. If this additional test 
shows that the hypothesis a\ = o\ may be accepted (at some suitable significance 
level), then we may indeed use the analysis in the above example to infer that 
the null hypothesis H 0 : p\ = p 2 may be rejected at the 5% significance level. 
If, however, we find that the additional hypothesis aj = a\ must be rejected, 
then we can only infer from the above example that the hypothesis that the two 
samples were drawn from the same Gaussian distribution may be rejected at the 
5% significance level. 

Throughout the above discussion, we have assumed that samples are drawn 
from a Gaussian distribution. Although this is true for many random variables, 
in practice it is usually impossible to know a priori whether this is case. It can 
be shown, however, that Student’s f-test remains reasonably accurate even if the 
sampled distribution(s) differ considerably from a Gaussian. Indeed, for sampled 
distributions that differ only slightly from a Gaussian form, the accuracy of 
the test is remarkably good. Nevertheless, when applying the f-test, it is always 
important to remember that the assumption of a Gaussian parent population is 
central to the method. 


27.7.6 Fisher’s F-test 

Having concentrated on tests for the mean p of a Gaussian distribution, we 
now consider tests for its standard deviation a. Before discussing Fisher’s F-test 
for comparing the standard deviations of two samples, we begin by considering 
the case when an independent sample xi,X 2 ,...,xjy is drawn from a Gaussian 
distribution with unknown p and a, and we wish to distinguish between the two 
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X(u) 



Figure 27.12 The sampling distribution P(u\H 0 ) for N = 10; this is a chi- 
squared distribution for IV — 1 degrees of freedom. 


hypotheses 

Hq : a 2 = Gq, —oo < /< < oo and Hi : g 2 ^ er q , — oo < fi < oo, 


where ffg is a given number. Here, the parameter space H is the half-plane 
—oo < /( < oo, 0 < cr 2 < oo, whereas the subspace S characterised by the null 
hypothesis Ho is the line g 2 = erg, — oo < fi< oo. 

The likelihood function for this situation is given by 

L( * ; "- t ’ !|= (2tSW eXP 

The maximum of L in A occurs at /< = x and a 2 = s 2 , whereas the maximum of 
L in 5 is at /t = x and cr 2 = Gq. Thus, the generalised likelihood ratio is given by 


Ei(xi ~ fO' 
2g 2 


= L(x;x, erg) 
L(x;x, s 2 ) 


/ u \ N / 2 r , n 

\ n ) exp [-5 (m-1V)J , 


where we have introduced the variable 

_ Ns 2 _ £,-(*,■ -x) 2 

U rr- fT- 


(27.121) 


An example of this distribution is plotted in figure 27.12 for JV = 10. From 
the figure, we see that the rejection region X < A cr it corresponds to a two-tailed 
rejection region on u given by 


0 < u < a and b < u < oo, 


where a and b are such that A cr it i a ) = X cnt (b), as shown in figure 27.12. In practice, 
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however, it is difficult to determine a and b for a given significance level a, so a 
slightly different rejection region, which we now describe, is usually adopted. 

The sampling distribution P{u\H 0 ) may be found straightforwardly from the 
sampling distribution of s given in (27.35). Let us first determine P(s 2 \Hq) by 
demanding that 

P(s\H 0 )ds = P(s 2 \H 0 ) d{s 2 ). 


from which we find 


PWo) _ (N/ 2<7o 2 ) <N — 1)/2 (s2) pv— 3)/2 


P(s z \H 0 ) = 


2s 


r (j(iV- 1)) 


Ns 2 

exp 1 —2 51 


(27.122) 


Thus, the sampling distribution of u = JVs 2 /°o is given by 


P{u\H 0 ) = —— — — — — r-n (JV 3)/2 exp (—in) . 

2(JV-1)/2T (1(AT — 1)) 2 ’ 


We note, in passing, that the distribution of u is precisely that of an (N — 1) th- 
order chi-squared variable (see subsection 26.9.4), i.e. u ~ Xn-v Although it does 
not give quite the best test, one then takes the rejection region to be 


0 < u < a and b < u < oo, 


with a and b chosen such that the two tails have equal areas; the advantage of 
this choice is that tabulations of the chi-squared distribution make the size of this 
region relatively easy to estimate. Thus, for a given significance level a, we have 


P(u\Hq) du = a/2 


and 


P(u\Ho)du = a/2. 


► Ten independent sample values x,-, ; = 1,2,..., 10, are drawn at random from a Gaussian 
distribution with unknown mean p and standard deviation a. The sample values are as 
follows : 

2.22 2.56 1.07 0.24 0.18 0.95 0.73 -0.79 2.09 1.81 

Test the null hypothesis Ho : a 2 = 2 at the 10% significance level. 


For our null hypothesis <Jq = — Since for this sample s = 1.01 and N = 10, from (27.121) 
we have u = 5.10. For a = 0.1 we find, either numerically or using tables, that a = 3.30 
and b = 16.92. Thus, our rejection region is 

0 < u < 3.33 and 16.92 < u < oo. 

The value u = 5.10 from our sample does not lie in the rejection region, and so we cannot 
reject the null hypothesis Ho '■ er 2 = 2. ◄ 
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We now turn to Fishers F-test. Let us suppose that two independent samples 
of sizes /V| and N 2 are drawn from Gaussian distributions with means and 
variances ni,a 2 and /< 2 , a\ respectively, and we wish to distinguish between the 
two hypotheses 


Hq : <Xj = a\ and Hi : a\ a\. 


In this case, the generalised likelihood ratio is found to be 


, (JVi + N 2 ) m+N > )/2 [F(Ni - l)/(N 2 - 1 )] Nl/2 

< l/2 ivf /2 [1 + F(Ni - 1 )/(N 2 - l)p +Ay/2 ’ 

where F is given by the variance ratio 


F _ iVisf/(jVi - 1) s m 2 
N 2 s 2 /(N 2 - 1) v 2 


(27.123) 


and S! and s 2 are the standard deviations of the two samples. On plotting X as a 
function of F, it is apparent that the rejection region X < A cr i t corresponds to a 
two-tailed test on F. Nevertheless, as will shall see below, by defining the fraction 
(27.123) appropriately, it is customary to make a one-tailed test on F. 

The distribution of F may be obtained in a reasonably straightforward manner 
by making use of the distribution of the sample variance s 2 given in (27.122). 
Under our null hypothesis Ho, the two Gaussian distributions share a common 
variance, which we denote by a 2 . Changing the variable in (27.122) from s 2 to u 2 
we find that u 2 has the sampling distribution 


P(ir\H 0 ) 


N- 1 
2<7 2 


(N- 1)/2 


rQ(iv-i)) 


(u 2 r- 


- 3)/2 


exp 


(TV — 1)m 2 
2er 2 


Since u 2 and v 2 are independent, their joint distribution is simply the product of 
their individual distributions and is given by 


P(ic\Ho)P(v 2 \H 0 ) = H(M 2 ) (JVl - 3)/2 (r 2 ) (JV2 ~ 3)/2 exp 


(N\ — l)u 2 + (N 2 — l)t> 2 
2er 2 


where the constant A is given by 

(Nt. - l)(*t-i)/ 2 (jv 2 - 1)W-D/ 2 

- 2 (N 1+ N 2 -2)/2 a (N 1+ N 2 -2) r (l (iVl _ 1)) T (i(iV 2 - 1)) ' Ql A2A) 

Now, for fixed v we have u 2 = Fv 2 and d(u 2 ) = v 2 dF. Thus, the joint sampling 
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distribution P(v 2 ,F\Ho) is obtained by requiring that 

P(v 2 , F\Hq) d(v 2 ) dF = P(u 2 \H 0 )P(v 2 \Ho)d(u 2 )d(v 2 ). 

(27.125) 

In order to find the distribution of F alone, we now integrate P(v 2 ,F\FIo) with 
respect to v 2 from 0 to oo, from which we obtain 


P(F\H 0 ) = 


( N i-l 

\n 2 - 1 


(Ni—l)/2 


1 77(JVi-3)/2 

BdiNi- l),i(IV 2 -l)) 


Ni - 1 

N 2 - 1 



-(Ni+N 2 -2)/2 


(27.126) 


where B Q(iVi — 1), ^{N 2 — 1)) is the beta function defined in the Appendix. 
P(F\Hq ) is called the F -distribution (or occasionally the Fisher distribution) with 
(TVi — 1, N 2 — 1) degrees of freedom.. 


► Evaluate the integral f^° P(v 2 ,F\Ho)d(v 2 ) to obtain result (27.126). 


From (27.125), we have 


P(F\H 0 ) = AF iNl - })/2 (r 2 ) (Nl+JV2 - 4)/2 exp j- 


[(N^\)F + (N 2 - l)]r 2 
2 a 2 


d{v 2 ). 


Making the substitution x = [(TV i — 1)F + (N 2 — l)]r 2 /(2cr 2 ), we obtain 


P(F\H 0 ) = A 

= A 


2a 2 


-i (JV!+JV 2 ^2)/2 


(Ni — 1)F + (N 2 — 1 ) 
2cr 2 

(Ni — 1)F + (N 2 — 1 ) 


p(N!-3)/2 / x (N,+N 2 -4)l2 e -x fa 


{Ni+N 2 -2)/ 2 


f ( n i-3)/ 2 Y Q(Ni + N 2 — 2)) , 


where in the last line we have used the definition of the gamma function given in the 
Appendix. Using the further result (All), which expresses the beta function in terms of 
the gamma function, and the expression for A given in (27.124), we see that P(F\H 0 ) is 
indeed given by (27.126). ◄ 


As it does not matter whether the ratio F given in (27.123) is defined as u 2 /v 2 
or as v 2 /u 2 , it is conventional to put the larger sample variance on the top, so 
that F is always greater than or equal to unity. A large value of F indicates that 
the sample variances u 2 and v 2 are very different whereas a value of F close to 
unity means that they are very similar. Therefore, for a given significance a, it is 
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C,, u „ 2 {F) 

ni = 1 

2 

3 

4 

5 

6 

7 

8 

n 2 = 1 

161 

200 

216 

225 

230 

234 

237 

239 

2 

18.5 

19.0 

19.2 

19.2 

19.3 

19.3 

19.4 

19.4 

3 

10.1 

9.55 

9.28 

9.12 

9.01 

8.94 

8.89 

8.85 

4 

7.71 

6.94 

6.59 

6.39 

6.26 

6.16 

6.09 

6.04 

5 

6.61 

5.79 

5.41 

5.19 

5.05 

4.95 

4.88 

4.82 

6 

5.99 

5.14 

4.76 

4.53 

4.39 

4.28 

4.21 

4.15 

7 

5.59 

4.74 

4.35 

4.12 

3.97 

3.87 

3.79 

3.73 

8 

5.32 

4.46 

4.07 

3.84 

3.69 

3.58 

3.50 

3.44 

9 

5.12 

4.26 

3.86 

3.63 

3.48 

3.37 

3.29 

3.23 

10 

4.96 

4.10 

3.71 

3.48 

3.33 

3.22 

3.14 

3.07 

20 

4.35 

3.49 

3.10 

2.87 

2.71 

2.60 

2.51 

2.45 

30 

4.17 

3.32 

2.92 

2.69 

2.53 

2.42 

2.33 

2.27 

40 

4.08 

3.23 

2.84 

2.61 

2.45 

2.34 

2.25 

2.18 

50 

4.03 

3.18 

2.79 

2.56 

2.40 

2.29 

2.20 

2.13 

100 

3.94 

3.09 

2.70 

2.46 

2.31 

2.19 

2.10 

2.03 

OO 

3.84 

3.00 

2.60 

2.37 

2.21 

2.10 

2.01 

1.94 


n\ =9 

10 

20 

30 

40 

50 

100 

OO 

«2 = 1 

241 

242 

248 

250 

251 

252 

253 

254 

2 

19.4 

19.4 

19.4 

19.5 

19.5 

19.5 

19.5 

19.5 

3 

8.81 

8.79 

8.66 

8.62 

8.59 

8.58 

8.55 

8.53 

4 

6.00 

5.96 

5.80 

5.75 

5.72 

5.70 

5.66 

5.63 

5 

4.77 

4.74 

4.56 

4.50 

4.46 

4.44 

4.41 

4.37 

6 

4.10 

4.06 

3.87 

3.81 

3.77 

3.75 

3.71 

3.67 

7 

3.68 

3.64 

3.44 

3.38 

3.34 

3.32 

3.27 

3.23 

8 

3.39 

3.35 

3.15 

3.08 

3.04 

3.02 

2.97 

2.93 

9 

3.18 

3.14 

2.94 

2.86 

2.83 

2.80 

2.76 

2.71 

10 

3.02 

2.98 

2.77 

2.70 

2.66 

2.64 

2.59 

2.54 

20 

2.39 

2.35 

2.12 

2.04 

1.99 

1.97 

1.91 

1.84 

30 

2.21 

2.16 

1.93 

2.69 

1.79 

1.76 

1.70 

1.62 

40 

2.12 

2.08 

1.84 

1.74 

1.69 

1.66 

1.59 

1.51 

50 

2.07 

2.03 

1.78 

1.69 

1.63 

1.60 

1.52 

1.44 

100 

1.97 

1.93 

1.68 

1.57 

1.52 

1.48 

1.39 

1.28 

OO 

1.88 

1.83 

1.57 

1.46 

1.39 

1.35 

1.24 

1.00 


Table 27.3 Values of F for which the cumulative probability function C„ U „ 2 (F) 
of the F-distribution with (n l ,n 2 ) degrees of freedom has the value 0.95. For 
example, for ni = 10 and n 2 = 6, C„ 1i „ 2 (4.06) = 0.95. 


customary to define the rejection region on F as F > F cnt , where 


C„„„ 2 (F C r, t ) = / P(F\H 0 )dF = 


and ni = iVi — 1 and n 2 = N 2 — 1 are the numbers of degrees of freedom. 
Table 27.3 lists values of F cr j t corresponding to the 5% significance level (i.e. 
a = 0.05) for various values of n\ and n 2 . 
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► Suppose that two classes of students take the same mathematics examination and the 
following percentage marks are obtained: 

Class 1 : 66 62 34 55 77 80 55 60 69 47 50 

Class 2: 64 90 76 56 81 72 70 

Assuming that the two sets of examinations marks are drawn from Gaussian distributions, 
test the hypothesis Ho : a\ = <7 2 at the 5% significance level. 


The variances of the two samples are s 2 = (12.8) 2 and s 2 = (10.3) 2 and the sample sizes 
are JVi = 11 and N 2 = 7. Thus, we have 


Ni$ 
N , - 1 


= 180.2 and it = 


Nis f 

N 2 - 1 


= 123.8, 


where we have taken u 2 to be the larger value. Thus, F = u 1 /v 1 = 1.46 to two decimal 
places. Since the first sample contains eleven values and the second contains seven values, 
we take n\ = 10 and n 2 = 6. Consulting table 27.3, we see that, at the 5% significance 
level, F cr it = 4.06. Since our value lies comfortably below this, we conclude that there is 
no statistical evidence for rejecting the hypothesis that the two samples were drawn from 
Gaussian distributions with a common variance. ◄ 


It is also common to define the variable z = \ In F, the distribution of which 
can be found straightfowardly from (27.126). This is a useful change of variable 
since it can be shown that, for large values of n 1 and n 2 , the variable z is 
distributed approximately as a Gaussian with mean ^(nf 1 — nf 1 ) and variance 
\{rq x +nj 1 ). 


27.7.7 Goodness of fit in least squares problems 

We conclude our discussion of hypothesis testing with an example of a goodness 
of fit test. In section 27.6, we discussed the use of the method of least squares in 
estimating the best-fit values of a set of parameters a in a given model y = /(x; a) 
for a data set(x,-, y,)> f = 1,2, We have not addressed, however, the question 
of whether the best-fit model y = f(x; a) does, in fact, provide a good fit of the 
data. In other words, we have not considered thus far how to verify that the 
functional form / of our assumed model is indeed correct. In the language of 
hypothesis testing, we wish to distinguish between the two hypotheses 

Hq : model is correct and Hi : model is incorrect. 

Given the vague nature of the alternative hypothesis Hi, we clearly cannot use 
the generalised likelihood-ratio test. Nevertheless, it is still possible to test the 
null hypothesis Hq at a given significance level a. 

The least squares estimates of the parameters ai,fl 2 , as discussed in 
section 27.6, are those values that minimise the quantity 

N 

r(a) = L v ' ~ f( x ‘ ; a )l ( N_l )ii hi - f( x j ; a )] = (y — f) T N- 1 (y — t). 

y=d 
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In the last equality, we rewrote the expression in matrix notation by defining the 
column vector f with elements /,• = /(.x, ;a). The value % 2 (a) at this minimum can 
be used as a statistic to test the null hypothesis Hq, as follows. The N quantities 
y, — f(xii a) are Gaussian distributed. However, provided the function f(xj ; a) is 
linear in the parameters a, the equations (27.98) that determine the least squares 
estimate a constitute a set of M linear constraints on these N quantities. Thus, 
as discussed in subsection 26.15.2, the sampling distribution of the quantity / 2 (a) 
will be a chi-squared distribution with N — M degrees of freedom (d.o.f), which has 
the expectation value and variance 

£[% 2 (a)] = iV — M and V[x 2 (k)] = 2(N — M). 

Thus we would expect the value of x 2 (a) to lie typically in the range (N — M) + 
y/2{N — M). A value lying outside this range may suggest that the assumed model 
for the data is incorrect. A very small value of x 2 (a) is usually an indication that 
the model has too many free parameters and has ‘over-fitted’ the data. More 
commonly, the assumed model is simply incorrect, and this usually results in a 
value of x 2 (a) that is larger than expected. 

One can choose to perform either a one-tailed or a two-tailed test on the 
value of ^ 2 (a). It is usual, for a given significance level a, to define the one-tailed 
rejection region to be x 2 (a) > G where the constant c satisfies 

/ OO 

P(Z 2 )^Z 2 =« (27.127) 

and P(xl) is the PDF of the chi-squared distributiom with n = N — M degrees of 
freedom (see subsection 26.9.4). 


►Hrc experiment produces the following data sample pairs (x,-, y,): 

Xi : 1.85 2.72 2.81 3.06 3.42 3.76 4.31 4.47 4.64 4.99 

y,: 2.26 3.10 3.80 4.11 4.74 4.31 5.24 4.03 5.69 6.57 

where the x r values are known exactly but each y r value is measured only to an accuracy 

of a = 0.5. At the one-tailed 5% significance level, test the null hypothesis Ho that the 

underlying model for the data is a straight line y = mx + c. 


These data are the same as those investigated in section 27.6 and plotted in figure 27.9. As 
shown previously, the least squares estimates of the slope m and intercept c are given by 

m = 1.11 and c = 0.4. (27.128) 

Since the error on each y r value is drawn independently from a Gaussian distribution with 
standard deviation a, we have 


N 


x 2 w = Y. 

i=i 


.y,--/(x,;a) 

a 



(27.129) 


Inserting the values (27.128) into (27.129), we obtain x 2 (m,c) = 11.5. In our case, the 
number of data points is N = 10 and the number of fitted parameters is M = 2. Thus, the 
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number of degrees of freedom is n = N — M = 8. Setting n = 8 and a = 0.05 in (27.127) 
we find, either numerically or from tables, that c = 15.51. Hence our rejection region is 

X 2 (m,c) > 15.51. 

Since we found that / 2 (m,c) = 11.5, we cannot reject the null hypothesis that the underlying 
model for the data is a straight line y = mx + c. ◄ 

As mentioned above, our analysis is only valid if the function /(x; a) is linear 
in the parameters a. Nevertheless, it is so convenient that it is sometimes applied 
in non-linear cases, provided the non-linearity is not too severe. 


27.8 Exercises 

27.1 A group of students uses a pendulum experiment to measure g, the acceleration 
of free fall, and obtain the following values (in m s~ 2 ): 9.80, 9.84, 9.72, 9.74, 
9.87, 9.77, 9.28, 9.86, 9.81, 9.79, 9.82. What would you give as the best value and 
standard error for g as measured by the group? 

27.2 Measurements of a certain quantity gave the following values: 296, 316, 307, 278, 
312, 317, 314, 307, 313, 306, 320, 309. Within what limits would you say there is 
a 50% chance that the correct value lies? 

27.3 The following are the values obtained by a class of 14 students when measuring 
a physical quantity x: 53.8, 53.1, 56.9, 54.7, 58.2, 54.1, 56.4, 54.8, 57.3, 51.0, 55.1, 
55.0, 54.2, 56.6. 

(a) Display these results as a histogram and state what you would give as the 
best value for x. 

(b) Without calculation estimate how much reliance could be placed upon your 
answer to (a). 

(c) Databooks give the value of x as 53.6 with negligible error. Are the data 
obtained by the students in conflict with this? 

27.4 Two physical quantities x and y are connected by the equation 


and measured pairs of values for x and y are as follows: 

x: 10 12 16 20 

y: 409 196 114 94. 

Determine the best values for a and b by graphical means and (either by hand 
or by using a built-in calculator routine) by a least squares fit to an appropriate 
straight line, 

27.5 Measured quantities x and y are known to be connected by the formula 

ax 

y = ^T b' 

where a and b are constants. Pairs of values obtained experimentally are 

x: 2.0 3.0 4.0 5.0 6.0 

y: 0.32 0.29 0.25 0.21 0.18. 

Use these data to make best estimates of the values of y that would be obtained 
for (a) x = 7.0, and (b) x = —3.5. As measured by fractional error, which estimate 
is likely to be the more accurate? 
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27.6 Prove that the sample mean is the best linear unbiased estimator of the population 
mean p as follows. 

(a) If the real numbers ai,a 2 ,...,a„ satisfy the constraint ^" =| a, = C, where C 
is a given constant, show that Ym=i is minimised by a,- = C jn for all i. 

(b) Consider the linear estimator p = Ym=i a i x i- Impose the conditions (i) that it 
is unbiased , and (ii) that it is as efficient as possible. 

27.7 A population contains individuals of k types in equal proportions. A quantity X 
has mean p t amongst individuals of type i, and variance a 2 which has the same 
value for all types. In order to estimate the mean of X over the whole population, 
two schemes are considered; each involves a total sample size of nk. In the first 
the sample is drawn randomly from the whole population, whilst in the second 
( stratified sampling ) n individuals are randomly selected from each of the k types. 

Show that in both cases the estimate has expectation 



i=l 


but that the variance of the first scheme exceeds that of the second by an amount 


1 

k 2 n 


i=i 


27.8 Carry through the following proofs of statements made in subsections 27.5.2 and 
27.5.3 about the ML estimators t and /.. 

(a) Find the expectation values of the ML estimators t and X given respectively 
in (27.71) and (27.75). Hence verify equations (27.76) which show that, even 
though an ML estimator is unbiased, it does not follow that functions of it 
are also unbiased. 

(b) Show that E [f 2 ] = (N+ 1 )i 2 /N and hence prove that i is a minimum-variance 
estimator of t. 

27.9 An experiment consists of a large, but unknown, number n (>• 1) of trials in 
each of which the probability of success p is the same, but also unkown. In the 
ith trial, i = 1,2,..., A, the total number of successes is x, (>• 1). Determine the 
log-likelihood function. 

Using Stirling’s approximation to In (n — x), show that 


dln(n — x ) 
dn 


1 

2{n — x) 


+ ln(ji — x), 


and hence evaluate 8("C x )/dn. 

By finding the (coupled) equations determining the ML estimators p and fi, show 
that, to order AD 1 , they must satisfy the simultaneous ‘arithmetic’ and ‘geometric’ 
mean constraints 


np = v x > and 
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27.10 This exercise is intended to illustrate the dangers of applying formalised estimator 
techniques to distributions that are not well behaved in a statistical sense. 

The following are five sets of 10 values, all drawn from the same Cauchy 
distribution with parameter a. 


(i) 

4.81 

-1.24 

1.30 

-0.23 

2.98 


-1.13 

-8.32 

2.62 

-0.79 

-2.85 

(ii) 

0.07 

1.54 

0.38 

-2.76 

-8.82 


1.86 

-4.75 

4.81 

1.14 

-0.66 

(hi) 

0.72 

4.57 

0.86 

-3.86 

0.30 


-2.00 

2.65 

-17.44 

-2.26 

-8.83 

(iv) 

-0.15 

202.76 

-0.21 

-0.58 

-0.14 


0.36 

0.44 

3.36 

-2.96 

5.51 

(v) 

0.24 

-3.33 

-1.30 

3.05 

3.99 


1.59 

-7.76 

0.91 

2.80 

-6.46 


27.11 


27.12 


Ignoring the fact that the Cauchy distribution does not have a finite variance (or 
even a formal mean), show that a , the ML estimator of a , has to satisfy 


to 

s(a) = y 

i=l 


1 

1 +xj/ a 2 


5. (*) 


Using a programmable calculator, spreadsheet or computer, find the value of 
a that satisfies (*) for each of the data sets and compare it with the value 
a = 1.6 used to generate the data. Form an opinion regarding the variance of the 
estimator. 

Show further that if it is assumed that (E [5] ) 2 = E [a 2 ] then E [a] = v, 1/2 , where 
v 2 is the second (central) moment of the distribution, which for the Cauchy 
distribution is infinite ! 

According to a particular theory, two dimensionless quantities X and Y have 
equal values. Nine measurements of X gave values of 22, 11, 19, 19, 14, 27, 8, 
24 and 18, whilst seven measured values of Y were 11, 14, 17, 14, 19, 16 and 
14. Assuming that the measurements of both quantities are Gaussian distributed 
with a common variance, are they consistent with the theory? An alternative 
theory predicts that Y 2 = n 2 X\ is the data consistent with this proposal? 

On a certain (testing) steeplechase course there are 12 fences to be jumped and 
any horse that falls is not allowed to continue in the race. In a season of racing 
a total of 500 horses started the course and the following numbers fell at each 
fence : 

Fence: 1 2 3 4 5 6 7 8 9 10 11 12 

Falls: 62 75 49 29 33 25 30 17 19 11 15 12 


Use this data to determine the overall probability of a horse falling at a fence, 

and test the hypothesis that it is the same for all horses and fences as follows. 

(a) draw up a table of the expected number of falls at each fence on the basis 
of the hypothesis; 

(b) consider for each fence i the standardised variable 

estimated falls — actual falls 

Z; = 

standard deviation of estimated falls 
and use it in an appropriate x 2 test; 

(c) show that the data indicates that the odds against all fences being equally 
testing are about 40 to 1. Identify the fences that are significantly easier or 
harder than the average. 
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27.13 A similar technique to that employed in exercise 27.12 can be used to test 
correlations between characteristics of sampled data. To illustrate this consider 
the following problem. 

During an investigation into possible links between mathematics and classical 
music, pupils at a school were asked whether they had preferences (a) between 
mathematics and english, and (b) between classical and pop music. The results 
are given below. 



Classical 

None 

Pop 

Mathematics 

23 

13 

14 

None 

17 

17 

36 

English 

30 

10 

40 


By computing tables of expected numbers, based on the assumption that no 
correlations exist, and calculating the relevant values of x 2 , determine whether 
there is any evidence for 

(a) a link between academic and musical tastes, and 

(b) a claim that pupils either had preferences in both areas or had no preference. 

You will need to consider the appropriate value for the number of degrees of 
freedom to use when applying the x 2 test. 

27.14 Three candidates X , Y and Z were standing for election to a vacant seat on their 
college’s Student Committee. The members of the electorate (current first-year 
students, consisting of 150 men and 105 women) were each allowed to cross 
out the name of the candidate they least wished to be elected, the other two 
candidates then being credited with one vote each, the following data are known. 

(a) X received 100 votes from men, whilst Y received 65 votes from women. 

(b) Z received five more votes from men than X received from women. 

(c) The total votes cast for X and Y were equal. 

Analyse this data in such a way that a x 2 test can be used to determine whether 
voting was other than random (i) amongst men, and (ii) amongst women. 

27.15 A particle detector consisting of a shielded scintillator is being tested by placing 
it near a particle source of controlled intensity (by the use of absorbers). It might 
register counts even in the absence of particles from the source because of the 
cosmic ray background. 

The number of counts n registered in a fixed time interval as a function of the 
source strength s is given in the following table : 

Source strength: s0123456 
Counts n: 6 11 20 42 44 62 61 

At any given source strength the number of counts is expected to be Poisson 
distributed with mean 

n = a + bs, 

where a and b are constants. Analyse the data for a fit to this relationship and 
obtain the best values for a and b together with their standard errors. 

(a) How well is the cosmic ray background determined? 

(b) What is the value of the correlation coefficient between a and b? Is this 
consistent with what would happen if the cosmic ray background were 
imagined to be negligible? 

(c) Does the data fit the expected relationship well? Is there any evidence that 
the reported data ‘is too good a fit’? 
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27.16 The function y(x) is known to be a quadratic function of x. The following table 
gives the measured values and uncorrelated standard errors of y measured at 
various values of x (in which there is negligible error) : 

x 1 2 3 4 5 

y(x) 3.5 + 0.5 2.0 + 0.5 3.0 ±0.5 6.5 ±1.0 10.5 + 1.0 

Construct the response matrix R using as basis functions 1, x, x 2 . Calculate the 
matrix R T N _1 R and show that its inverse, the covariance matrix V, has the form 

. / 12 592 -9708 1580 \ 

V = -9708 8413 -1461 . 

9184 y 1580 -1461 269 J 

Use this matrix to find the best values, and their uncertainties, for the coefficients 
of the quadratic form for y(x). 

27.17 The following are the values and standard errors of a physical quantity f(9) 
measured at various values of 9 (in which there is negligible error): 


9 

0 

n/6 

7t/4 

n/2 

m 

3.72 + 0.2 

1.98 + 0.1 

-0.06 + 0.1 

-2.05 + 0.1 

9 

n/2 

2n/3 

3n/4 

n 

m 

-2.83 + 0.2 

1.15 + 0.1 

3.99 ± 0.2 

9.71+0.4 


Theory suggests that / should be of the form a\ + a 2 cos 9 + a 3 cos 29. Show that 
the normal equations for the coefficients a t are 

481. 3«i + 158.4«2 - 43.8a 3 = 284.7, 

158.4ai + 218. 8«2 + 62.1a3 = —31.1, 

— 43.8ui + 62.1o2 + 131. 3«3 = 368.4. 

(a) If you have matrix inversion routines available on a computer, determine the 
best values and variances for the coefficients a,- and the correlation between 
the coefficients a t and a 2 . 

(b) If you have only a calculator available, solve for the values using Gauss- 
Seidel iteration starting from the approximate solution a\ = 2, a 2 = —2, a 2 = 
4. 

27.18 Prove that the expression given for the Student's t-distribution in equation (27. 1 1 8 ) 
is correctly normalised. 

27.19 Verify that the F-distribution P(F) given explicitly in equation (27.126) is symme- 
tric between the two data samples, i.e. that it retains the same form but with N i 
and N 2 interchanged, if F is replaced by F' = F _1 . Symbolically, if P'(F') is the 
distribution of F' and P{F) = rj(F,Ni,N 2 ), then P'(F') = ri(F',N 2 ,Ni). 

27.20 It is claimed that the two following sets of values were obtained (a) by ran- 
domly drawing from a normal distribution that is 7V(0, 1 ) and then (b) randomly 
assigning each reading to one of the two sets A and B. 

Set A -0.314 0.603 -0.551 -0.537 -0.160 -1.635 0.719 

0.610 0.482 -1.757 0.058 

Set B -0.691 1.515 -1.642 -1.736 1.224 1.423 1.165 

Make tests, including f- and F-tests, to establish whether there is any evidence 
that either claims is, or both claims are, false. 
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27.1 

27.2 

27.3 


27.4 

27.5 

27.6 

27.7 

27.8 


27.9 


27.10 


27.11 


27.9 Hints and answers 

Note that the reading of 9.28 m s~ 2 is clearly in error and should not be used in 
the calculation; 9.80 ± 0.02 m s~ 2 . 

The reading of 278 should probably be rejected. The other readings do not look 
as though they are Gaussian distributed and the best estimate is probably given 
by the inter-quartile range of the remaining 11 readings, i.e. 307 - 316. 

(a) 55.1. (b) Note that two thirds of the readings lie within +2 of the mean and 
that 14 readings are being used. This gives a standard error in the mean as 0.6. 
(c) Student's f has a value of about 2.5 for 13 d.o.f. (degrees of freedom), and 
therefore it is likely at the 3% significance level that the data are in conflict with 
the accepted value. 

Plot either xy~ 1/2 versus x 1/2 or (x/y) l/1 versus x~ 1/2 ; a = 1.20,6 = —3.29. 

Plot or calculate a least squares fit of either x 2 versus x/y or xy versus y/x 
to obtain a a 1.19 and b a 3.4. (a) 0.16; (b) —0.27. Estimate (b) is the more 
accurate because, using the fact that y(—x) = — y(x), it is effectively obtained by 
interpolation rather than extrapolation. 

(a) Use Lagrange multipliers, (b) Write x, as /t + z„ where (z 2 ) = a 2 for all i, and 
use the result of (a) to evaluate £[(/! — n) 2 ]. 

Recall that, because of the equal proportions of each type, the expected numbers 
of each type in the first scheme is n. Show that the variance of the estimator for 
the second scheme is a 2 /(kn). When calculating that for the first scheme, recall 
that x 2 = p 2 + a 1 and note that p 2 can be written as (p t — p + p) 2 . 

(a) Note that f P(x|r) Ylj^i dxj = i 1 exp (— Xj/r). 

With E[l] = f JV(5^x,U 1 2 N exp(—A +] x.) dx, find the first-order differential 
equation involving dE[l]/dL The relevant integrating factor is 2r N . 

(b) Denoting r -1 Jx r exp(— x/x)dx by J r , show that E[t 2 ] = AU 1 [J 2 + (IV— 1)7 2 ]. 
The log-likelihood function is 


In L = ^ In n C Xj + Xj In p + Nn — x t 1 ln(l — p); 


dfC,-) 

8n 


; In 


— ) 
— x) 


2 n(n — x) 


Ignore the second term on the RHS of the above to obtain 





+ N ln( 1 — p) = 0. 


Remember that a appears in the normalisation constant of the Cauchy distribu- 
tion. 

(i) 1.85; (ii) 1.66; (iii) 2.46; (iv) 0.68; (v) 2.44. Although the estimates have the 
correct order of magnitude, there is clearly a very large (perhaps infinite) sam- 
pling variance. Even when all 50 samples are combined the estimated value of 
1.84 is still 0.24 different from that used to generate the data. 

X = 18.0 + 2.2, Y = 15.0 + 1.1. a = 4.92 giving t = 1.21 for 14 d.o.f., and 
is significant only at the 75% level. Thus there is no significant disagreement 
between the data and the theory. For the second theory, only the mean values can 
be tested as Y 2 will not be Gaussian distributed. T 2 — n 2 X = 47 + 38 and is not 
significantly different from zero. Again the data is consistent with the proposed 
theory. 
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27.12 Whilst the distribution of falls at each fence is formally binomial, it can be 
approximated for the purposes of the question by a Poisson distribution. The 
total number of falls is 377. The total number of attempted jumps = 3202. 
Overall probability of a fall is 0.1177. 

(a) 58.9, 51.6, 42.7, 37.0, 33.6, 29.7, 26.7, 23.2, 21.2, 19.0, 17.7, 15.9. (b) Z 2 = 21.2 
for 11 d.o.f. (c) The Rvalue is close to the 97.5% confidence limit. Fence 2 is 
much harder than the rest, fence 10 is easier, and fences 4 and 8 are somewhat 
easier than the average. 

27.13 Consider how many entries may be chosen freely in the table if all row and 
column totals are to match the observed values. It should be clear that for an 
m x n table the number of degrees of freedom is (m — 1)(« — 1). 

(a) In order to make the fractions expressing each preference or lack of prefer- 
ence correct, the expected distribution, if there were no correlation, is 



Classical 

None 

Pop 

Mathematics 

17.5 

10 

22.5 

None 

24.5 

14 

31.5 

English 

28 

16 

36 


This gives a x 2 of 12.3 for 4 d.o.f., making it less than 2% likely, that no 
correlation exists. 

(b) The expected distribution, if there were no correlation, is 

Music preference No music preference 
Academic preference 104 26 

No academic preference 56 14 

This gives a x 2 of 1.2 for one d.o.f and no evidence for the claim. 

27.14 The votes are not statistically independent and, once established, must be con- 
verted to a table of deletions before any x 2 test is applied. This table is 

Not X Not Y Not Z 
Men 50 35 65 

Women 25 40 40 

The relevant values of x 2 are (i) 9.0 and (ii) 4.3, both for 2 d.o.f., suggesting that 
the voting by men was almost certainly not random but that by women may 
have been. 

27.15 As the distribution at each value of s is Poisson, the best estimate of the 
measurement error is the square root of the number of counts, i.e.^/fi(s). Linear 
regression gives a = 4.3 + 2.1 and b = 10.06 + 0.94. 

(a) The cosmic ray background must be present, since fi(0) + 0 but its value of 
about 4 is uncertain to within a factor of 2. 

(b) The correlation coefficient between a and b is —0.63. Yes; if a were reduced 
towards zero then b would have to be increased to compensate. 

(c) Yes, x 2 = 4.9 for 5 d.o.f., which is almost exactly the ‘expected’ value, neither 
too good nor too bad. 

27.16 The matrix F^N^R has entries 14, 33, 97; 33, 97, 333; 97, 333, 1273, whilst 
R T N _1 b, where b is the data vector, has entries 51, 144.5, 520.5. 

y{x) = (6.73 + 1.17) - (4.34 + 0.96)x + (1.03 + 0.17)x 2 . 

27.17 = 2.02 + 0.06, a 2 = -2.99 + 0.09, a 3 = 4.90 + 0.10; r n = -0.60. 
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27.18 


27.19 


27.20 


Make the substitution t = y/N — 1 tan 9 to reduce the integral of the t-dependent 
part of (27.118) to 2 y/N — 1 J ( /' 2 cos N ~ 2 0 dO. Relate this to the beta function 
B Q, j(N — 1)), and hence to the gamma functions, using the relationships given 
in the Appendix. Note that T (') = yf%. 

Note that \dF\ = \dF’/F l2 \ and write 


1 + 


Ni - 1 

(N~2 ~ 1 )F' 


as 


Ni - 1 
(N 2 -1)F' 


' | (N,-1)F' - 
_ Ni - 1 


(a) The mean and variance of the whole sample are —0.068 and 1.180, which are 
obviously compatible with 1V(0, 1) without the need for statistical tests. 

(b) The means and variances of the two sets are: A, —0.226 and 0.815; B, 0.180 
and 2.554. The value of t for the difference between the two means is 0.692; for 
16 degrees of freedom, this or a greater value of f can be expected in marginally 
more than half all cases. The value of F is 3.13. For = 6 and m = 10, this 
value is very close to the 95% confidence limit of 3.22. Thus it is rather unlikely 
that the allocation between the two groups was made at random - set B has 
significantly more readings that are more than one standard deviation from the 
mean for a iV(0, 1) distribution than it should. 
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Numerical methods 


It happens frequently that the end product of a calculation or piece of analysis 
is one or more algebraic or differential equations, or an integral that cannot be 
evaluated in closed form or in terms of tabulated or pre-programmed functions. 
From the point of view of the physical scientist or engineer, who needs numerical 
values for prediction or comparison with experiment, the calculation or analysis 
is thus incomplete. 

With the ready availability of standard packages on powerful computers for 
the numerical solution of equations, both algebraic and differential, and for the 
evaluation of integrals, in principle there is no need for the investigator to do 
other than turn to them. However, it should be a part of every engineer’s or 
scientist’s competence to have some understanding of the kinds of procedure that 
are being put into practice within those packages. The present chapter indicates 
(at a simple level) some of the ways in which analytically intractable problems 
can be tackled using numerical methods. 

In the restricted space available in a book of this nature it is clearly not 
possible to give anything like a full discussion, even of the elementary points that 
will be made in this chapter. The limited objective adopted is that of explaining 
and illustrating by simple examples some of the basic principles involved. In 
many cases, the examples used can be solved in closed form anyway, but this 
‘obviousness’ of the answers should not detract from their illustrative usefulness, 
and it is hoped that their transparency will help the reader to appreciate some of 
the inner workings of the methods described. 

The student who proposes to study complicated sets of equations or make 
repeated use of the same procedures by, for example, writing computer programs 
to carry out the computations, will find it essential to acquire a good under- 
standing of topics hardly mentioned here. Amongst these are the sensitivity of 
the adopted procedures to errors introduced by the limited accuracy with which 
a numerical value can be stored in a computer (rounding errors) and to the 
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errors introduced as a result of approximations made in setting up the numerical 
procedures (truncation errors). For this scale of application, books specifically 
devoted to numerical analysis, data analysis and computer programming should 
be consulted. 

So far as is possible, the method of presentation here is that of indicating 
and discussing in a qualitative way the main steps in the procedure, and then 
of following this with an elementary worked example. The examples have been 
restricted in complexity to a level at which they can be carried out with a pocket 
calculator. Naturally it will not be possible for the student to check all the 
numerical values presented unless he or she has a programmable calculator or 
computer readily available, and even then it might be tedious to do so. However, 
it is advisable to check the initial step and at least one step in the middle of 
each repetitive calculation given in the text, so that how the symbolic equations 
are used with actual numbers is understood. Clearly the intermediate step should 
be chosen to be at a point in the calculation at which the changes are still 
sufficiently large that they can be detected by whatever calculating device is 
used. 

Where alternative methods for solving the same type of problem are discussed, 
for example in finding the roots of a polynomial equation, we have usually 
taken the same example to illustrate each method. This could give the mistaken 
impression that the methods are very restricted in applicability, but it is felt by 
the authors that using the same examples repeatedly has sufficient advantages, in 
terms of illustrating the relative characteristics of competing methods, to justify 
doing so. Once the principles are clear, little is to be gained by using new examples 
each time and, in fact, having some prior knowledge of the ‘correct answer’ should 
allow the reader to judge the efficiency and dangers of particular methods as the 
successive steps are followed through. 

One other point remains to be mentioned. Here, in contrast with every other 
chapter of this book, the value of a large selection of exercises is not clear cut. 
The reader with sufficient computing resources to tackle them can easily devise 
algebraic or differential equations to be solved, or functions to be integrated 
(which perhaps have arisen in other contexts). Further, the solutions of these 
problems will be self-checking, for the most part. Consequently, although a 
number of exercises are included, no attempt has been made to test the full range 
of ideas treated in this chapter. 


28.1 Algebraic and transcendental equations 

The problem of finding the real roots of an equation of the form f(x) = 0, where 
f(x) is an algebraic or transcendental function of x, is one that can sometimes 
be treated numerically even if explicit solutions in closed form are not feasible. 
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Figure 28.1 A graph of the function f(x) = x 5 — 2x 2 — 3 for x in the range 
0 < x < 1.9. 


Examples of the types of equation mentioned are the quartic equation 

ax 4 + bx + c = 0, 
and the transcendental equation 


x — 3 tanhx = 0. 

The latter type is characterised by the fact that it contains in effect a polynomial 
of infinite order on the left-hand side. 

We will discuss four methods that, in various circumstances, can be used to 
obtain the real roots of equations of the above types. In all cases we will take as 
the specific equation to be solved the fifth-order polynomial equation 

/ (x) = x 5 — 2x 2 — 3=0. (28.1) 

The reasons for using the same equation each time were discussed in the intro- 
duction to this chapter. 

For future reference and so that the reader may follow some of the calculations 
leading to the evaluation of the real root of (28.1), a graph of /(x) in the range 
0 < x < 1.9 is shown in figure 28.1. 

Equation (28.1) is one for which no solution can be found in closed form, that 
is in the form x = a where a does not explicitly contain x. The general scheme to 
be employed will be an iterative one in which successive approximations to a real 
root of (28.1) will be obtained, each approximation, it is to be hoped, being better 
than the preceding one; certainly, we require that the approximations converge 
and that they have as their limit the sought-for root. Let us denote the required 
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root by c and the values of successive approximations by xi, X 2 , . .., x,„ Then 

for any particular method to be successful, 

lim x„ = £ where /(£) = 0. (28.2) 

n — »oo 

However, success as dehned here is not the only criterion. Since, in practice, 
only a finite number of iterations will be possible, it is important that the values 
of x n be close to that of £ for all n > N, where N is a relatively low number; 
exactly how low it is naturally depends on the computing resources available and 
the accuracy required in the final answer. 

So that the reader may assess the progress of the calculations that follow, we 
record that to nine significant figures the real root of (28.1) has the value 

£ = 1.495 10640. (28.3) 

We now consider in turn four methods for determining the value of this root. 


28.1.1 Rearrangement of the equation 

If equation (28.1), f(x) = 0, can be recast into the form 

x — 4>(x) (28.4) 

where <j)(x) is a slowly varying function of x then an iteration scheme 

x „+ 1 = 4>(x„) (28.5) 

will often produce a fair approximation to the root £ after a few iterations, as 
follows. Clearly £ = (/>(£) since /(£) = 0; thus when x n is close to £ the next 
approximation, x„+i, will differ little from x„, the actual size of the difference giving 
an order-of-magnitude indication of the inaccuracy in x„+i (when compared 
with £). 

In the present case the equation can be written 

x = (2x 2 + 3) 1/5 . (28.6) 

Because of the presence of the one-fifth power, the RHS is rather insensitive 
to the value of x used to compute it, and so the form (28.6) fits the general 
requirements for the method to work satisfactorily. It remains only to choose a 
starting approximation. It is easy to see from figure 28.1 that the value x = 1.5 
would be a good starting point but, so that the behaviour of the procedure at 
values some way from the actual root can be studied, we will make a poorer 
choice, x\ = 1.7. 

With this starting value and the general recurrence relationship 

x, !+ i = (2x 2 + 3) 1 / 5 , (28.7) 
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n 

Xn 

f(x„ ) 

i 

1.7 

5.42 

2 

1.544 18 

1.01 

3 

1.506 86 

2.28 x 10- 1 

4 

1.497 92 

5.37 x 10~ 2 

5 

1.495 78 

1.28 x 10~ 2 

6 

1.495 27 

3.11 x 10- 3 

7 

1.495 14 

7.34 x 10~ 4 

8 

1.495 12 

1.76 x 10~ 4 


Table 28.1 Successive approximations to the root of (28.1) using the rear- 
rangement method. 


n 

A n 

f{A„) 

B n 

f(B n ) 

x n 

f{Xn) 

i 

1.0 

-4.0000 

1.7 

5.4186 

1.2973 

-2.6916 

2 

1.2973 

-2.6916 

1.7 

5.4186 

1.4310 

-1.0957 

3 

1.4310 

-1.0957 

1.7 

5.4186 

1.4762 

-0.3482 

4 

1.4762 

-0.3482 

1.7 

5.4186 

1.4897 

-0.1016 

5 

1.4897 

-0.1016 

1.7 

5.4186 

1.4936 

-0.0289 

6 

1.4936 

-0.0289 

1.7 

5.4186 

1.4947 

-0.0082 


Table 28.2 Successive approximations to the root of (28.1) using linear 
interpolation. 


successive values can be found. These are recorded in table 28.1. Although not 
strictly necessary, the value of /(x„ ) = x 5 n — 2xl — 3 is also shown at each stage. 

It will be seen that xj and all later x n agree with the precise answer (28.3) 
to within one part in 10 4 . However, /(x„) and x n — £ are both reduced by a 
factor of only about 4 for each iteration; thus a large number of iterations would 
be needed to produce a very accurate answer. The factor 4 is of course specific 
to this particular problem and would be different for a different equation. The 
successive values of x„ are shown in graph (a) of figure 28.2. 


28.1.2 Linear interpolation 

In this approach two values A\ and B\ of x are chosen with A\ < B i and 
such that f(Ai) and f(B\) have opposite signs. The chord joining the two points 
(Ai, f{Ai)) and (Bi,f(B\)) is then notionally constructed, as illustrated in graph 
( b ) of figure 28.2, and the value x\ at which the chord cuts the x-axis is determined 
by the interpolation formula 

*' “ /(£„)-/(/!„) ' |28 8) 
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Figure 28.2 Graphical illustrations of the iteration methods discussed in 
the text: (a) rearrangement; (b) linear interpolation; (c) binary chopping; 
(d) Newton-Raphson. 


with n = 1. Next f(x i) is evaluated and the process repeated after replacing by x\ 
either A\ or B i, according to whether /(xQ has the same sign as f{Ai) or f{B i) 
respectively. In figure 28.2(h), A\ is the one replaced. 

As can be seen in the particular example that we are considering, with this 
method there is a tendency, if the curvature of f(x) is of constant sign near 
the root, for one of the two ends of the successive chords to remain un- 
changed. 

Starting with the initial values Ai = 1 and B \ = 1.7, the results of the first 
five iterations using (28.8) are given in table 28.2 and indicated in graph ( b ) of 
figure 28.2. As with the rearrangement method, the improvement in accuracy, 
as measured by /(x„) and x n — <J, is a fairly constant factor at each iteration 
(approximately 3 in this case), and for our particular example there is little to 
choose between the two. Both tend to their limiting value of l, monotonically, 
from either higher or lower values, and this makes it difficult to estimate limits 
within which q can safely be presumed to lie. The next method to be described 
gives at any stage a range of values within which £ is known to lie. 
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n 

A n 

f(A„) 

B n 

f(Bn) 

x n 

f(Xn) 

i 

1.0000 

-4.0000 

1.7000 

5.4186 

1.3500 

-2.1610 

2 

1.3500 

-2.1610 

1.7000 

5.4186 

1.5250 

0.5968 

3 

1.3500 

-2.1610 

1.5250 

0.5968 

1.4375 

-0.9946 

4 

1.4375 

-0.9946 

1.5250 

0.5968 

1.4813 

-0.2573 

5 

1.4813 

-0.2573 

1.5250 

0.5968 

1.5031 

0.1544 

6 

1.4813 

-0.2573 

1.5031 

0.1544 

1.4922 

-0.0552 

7 

1.4922 

-0.0552 

1.5031 

0.1544 

1.4977 

0.0487 

8 

1.4922 

-0.0552 

1.4977 

0.0487 

1.4949 

-0.0085 


Table 28.3 Successive approximations to the root of (28.1) using binary 
chopping. 


28.1.3 Binary chopping 

Again two values of x, Ay and By, that straddle the root are chosen, such that 
Ay < B i and f(Ay) and f(B i) have opposite signs. The interval between them is 
then halved by forming 

x n = ±(A n + B n ), (28.9) 

with n = 1, and f(x i) is evaluated. It should be noted that xy is determined 
solely by Ay and By, and not by the values of f(Ay) and f{B i) as in the linear 
interpolation method. Now xi is used to replace either Ay or By, depending on 
which of f(Ay) or f(By) has the same sign as f(x i), i.e. if f(Ay) and f(x i) have the 
same sign then xi replaces Ay. The process is then repeated to obtain X 2 , xj, etc. 

This has been carried through in table 28.3 for our standard equation (28.1) 
and is illustrated in figure 28.2(c). The entries have been rounded to four places 
of decimals. It is suggested that the reader follows through the sequential replace- 
ments of the A n and B„ in the table and correlates the first few of these with 
graph (c) of figure 28.2. 

Clearly the accuracy with which £ is known in this approach increases by only 
a factor of 2 at each step, but this accuracy is predictable at the outset of the 
calculation and (unless f(x) has very violent behaviour near x = £) a range of x 
in which i; lies can be safely stated at any stage. At the stage reached in the last 
line of table 28.3 it may be stated that 1.4949 < £ < 1.4977. Thus binary chopping 
gives a simple approximation method (it involves less multiplication than linear 
interpolation, for example) that is predictable and relatively safe, although its 
convergence is slow. 


28.1.4 Newton-Raphson method 

The Newton-Raphson (NR) procedure is somewhat similar to the interpolation 
method but, as will be seen, has one distinct advantage over the latter. Instead 
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n 

x n 

f(Xn) 

i 

1.7 

5.42 

2 

1.545 01 

1.03 

3 

1.498 87 

7.20 x 10~ 2 

4 

1.495 13 

4.49 x 10- 4 

5 

1.495 106 40 

2.6 x 10~ 8 

6 

1.495 106 40 

- 


Table 28.4 Successive approximations to the root of (28.1) using the Newton- 
Raphson method. 


of (notionally) constructing the chord between two points on the curve of f(x ) 
against x, the tangent to the curve is notionally constructed at each successive 
value of x„ and the next value x„+i taken as the point at which the tangent cuts 
the axis f(x) = 0. This is illustrated in graph (d) of figure 28.2. 

If the rrth value is x n , the tangent to the curve of f(x) at that point has slope 
f’(x n ) and passes through the point x = x„, y = f(x n ). Its equation is thus 

y(x) = (x - x„)/'(x„) +/(x„). (28.10) 


The value of x at which y = 0 is then taken as x„+i ; thus the condition y(x„ + i ) = 0 
yields from (28.10) the iteration scheme 

j /to i u 

x n -\-\ x n — — (28.11) 

f'(Xn) 

This is the Newton-Raphson iteration formula. Clearly, if x„ is close to £ then x„+i 
is close to x„, as it should be. It is also apparent that if any of the x„ comes close 
to a stationary point of /, so that /'(x„) is close to zero, the scheme is not going 
to work well. 

For our standard example, (28.11) becomes 


x;j — 2xl ~ 3 _ 4x^ — 2x 2 + 3 
5x 4 — 4x„ 5x 4 — 4x„ 


(28.12) 


Again taking a starting value of xi = 1.7 we obtain in succession the entries 
in table 28.4. The different values are given to an increasing number of decimal 
places as the calculation proceeds; /(x„) is also recorded. 

It is apparent that this method is unlike the previous ones in that the increase 
in accuracy of the answer is not constant throughout the iterations but improves 
dramatically as the required root is approached. Away from the root the behaviour 
of the series is less satisfactory and from its geometrical interpretation it can be 
seen that if, for example, there were a maximum or minimum near the root then 
the series could oscillate between values on either side of it (instead of ‘homing 
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in’ on the root). The reason for the good convergence near the root is discussed 
in the next section. 

Of the four methods mentioned, no single one is ideal and, in practice, some 
mixture of them is usually to be preferred. The particular combination of methods 
selected will depend a great deal on how easily the progress of the calculation 
may be monitored, but some combination of the first three methods mentioned, 
followed by the NR scheme if great accuracy were required, would be suitable 
for most situations. 


28.2 Convergence of iteration schemes 

For iteration schemes in which x n+ \ can be expressed as a differentiable function 
of x„, e.g. the rearrangement or NR methods of the previous section, a partial 
analysis of the conditions necessary for a successful scheme can be made as 
follows. 

Suppose the general iteration formula is expressed as 

x n + 1 = F(x „) (28.13) 

((28.7) and (28.12) are examples). Then the sequence of values xi, X2,...,x n ,... is 
required to converge to the value £ that satisfies both 

f(0 = 0 and £ = (28.14) 

If the error in the solution at the nth stage is e„, i.e. x„ = t + e n , then 

£ + <?«+i = x„+i = F(x n ) = F(£ + e n ). (28.15) 

For the iteration process to converge, a decreasing error is required, i.e. |e„ + i| < 
|e„|. To see what this implies about F, we expand the right-hand term of (28.15) 
by means of a Taylor series and use (28.14) to replace (28.15) by 

l + D.+1 = l + enF'tf) + \e 2 n F"^) + ■■■ . (28.16) 

This shows that, for small e n , 

<?n+i ~ F'(£)e n 

and that a necessary (but not sufficient) condition for convergence is that 

\F'm < 1. (28.17) 

It should be noticed that this is a condition on F'(£) and not on /'(£), which 
may have any finite value. Figure 28.3 illustrates in a graphical way how the 
convergence proceeds for the case 0 < F'( £) < 1. 
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Figure 28.3 Illustration of the convergence of the iteration scheme x„+i = 
F(x„) when 0 < F'(C) < 1, where £ = F(£ )• The line y = x makes an angle 
n/4 with the axes. The broken line makes an angle tan -1 F'(£) with the x-axis. 


Equation (28.16) suggests that if F(x) can be chosen so that F'(£) = 0 then the 
ratio \e n+ \/e n \ could be made very small, of order e„ in fact. To go even further, 
if it can be arranged that the first few derivatives of F vanish at x = £ then the 
convergence, once x n has become close to c, could be very rapid indeed. If the 
first N — 1 derivatives of F vanish at x = £, i.e. 

F’(£) = F"(£) = • • • = F (JV-1) (f) = 0 (28.18) 


and consequently 


e n + 1 — 0(e„ 


(28.19) 


then the scheme is said to have Nth-order convergence. 

This is the explanation of the significant difference in convergence between the 
NR scheme and the others discussed (judged by reference to (28.19), so that the 
differentiability of the function F is not a prerequisite). The NR procedure has 
second-order convergence, as is shown by the following analysis. Since 


F(x) = x — 


F\x) = 1 - 


/(*) 

fix)’ 

fix) 

/'(*) 


mru) 

[/'(*)] 2 


/(*>/"(*) 
[/'(*)] 2 ' 


Now, provided /'(c) /= 0, it follows that F'(c) = 0 because f(x) = 0 at x = <^. 
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n 

•X-n+l 


i 

8.5 

4.5 

2 

5.191 

1.19 

3 

4.137 

1.4 x 10- 1 

4 

4.002257 

2.3 x 10- 3 

5 

4.000000637 

6.4 x 10- 7 

6 

4 

— 


Table 28.5 Successive approximations to ^/l6 using the iteration scheme 
(28.20). 


► The following is an iteration scheme for finding the square root of X : 


Xfl+l 



(28.20) 


Show that it has second-order convergence and illustrate its efficiency by finding, say, ,/l 6 
starting with a very poor guess ^/l6 = 1. 


If this scheme does converge to £ then £ will satisfy 



=> £ 2 =X, 


as required. The iteration function F is given by 


Fix) = 


1 

2 



and so, since £ 2 = X, 


F'i C) 


1 

2 



= 0 , 


whilst 

Thus the procedure has second-order, but not third-order, convergence. 

We now show the procedure in action. Table 28.5 gives successive values of x„ and of 
e„, the difference between x„ and the true value, 4. 

As we can see the scheme is crude initially, but once x„ gets close to £, it homes in on 
the true value extremely rapidly. ◄ 


28.3 Simultaneous linear equations 

As we saw in chapter 8, many situations in physical science can be described 
approximately or exactly by a set of N simultaneous linear equations in N 
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variables (unknowns), x,-, i= 1,2 The equations take the general form 
Anxi + ^-12X2 + ■ • • + ^ 4 iivAiv = bi, 

2 I 21 X 1 + ^-22*2 + ' ' ’ + ^2JV2Cjv = ^2, (28.21) 

An\Xi + ^4]V2^2 + ■ ■ ■ + A NN X N = b^, 

where the A are constants and form the elements of a square matrix A. The b t 
are given and form a column matrix b. If A is non-singular then (28.21) can be 
solved for the x,- using the inverse of A, according to the formula 

x = A -1 b. 

This approach was discussed at length in chapter 8 and will not be considered 
further here. 


28.3.1 Gaussian elimination 

We follow instead a continuation of one of the earliest techniques acquired by a 
student of algebra, namely the solving of simultaneous equations (initially only 
two in number) by the successive elimination of all the variables but one. This 
(known as Gaussian elimination) is achieved by using, at each stage, one of the 
equations to obtain an explicit expression for one of the remaining x,- in terms 
of the others and then substituting for that x,- in all other remaining equations. 
Eventually a single linear equation in just one of the unknowns is obtained. This 
is then solved and the result re-substituted in previously derived equations (in 
reverse order) to establish values for all the x,-. 

This method is probably very familiar to the reader and so a specific example 
to illustrate this alone seems unnecessary. Instead, we will show how a calculation 
along such lines might be arranged so that the errors due to the inherent lack of 
precision in any calculating equipment do not become excessive. This can happen 
if the value of N is large and particularly (and we will merely state this) if the 
elements Au,A 22 ,...,A NN 011 the leading diagonal of the matrix in (28.21) are 
small compared with the off-diagonal elements. 

The process to be described is known as Gaussian elimination with interchange. 
The only, but essential, difference from straightforward elimination is that before 
each variable x; is eliminated, the equations are reordered to put the largest (in 
modulus) remaining coefficient of x,- on the leading diagonal. 

We will take as an illustration a straightforward three-variable example, which 
can in fact be solved perfectly well without any interchange since, with simple 
numbers and only two eliminations to perform, rounding errors do not have 
a chance to build up. However, the important thing is that the reader should 
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appreciate how this would apply in (say) a computer program for a 1000-variable 
case, perhaps with unforseeable zeroes or very small numbers appearing on the 
leading diagonal. 


► So/re the simultaneous equations 





(a) xi 

+6x2 

—4x3 

= 8, 


(b) 3 xi 

— 20x 2 

+X3 

= 12, 

(28.22) 

(C) -X! 

+ 3 X 2 

+5.X3 

= 3 . 



Firstly, we interchange rows (a) and (b) to bring the term 3 xi onto the leading diagonal. In 
the following, we label the important equations (I), (II), (III), and the others alphabetically. 


(I) 

3 xi 

—20x2 

+X3 

= 12, 

(d) 

Xi 

+6x2 

— 4 x 3 

= 8, 

(e) 

-Xi 

+3.X2 

+ 5 x 3 

= 3 . 


For (j) = (d) and (e), replace row (j) by 

djl 

row (j) y x row (I), 

where o,i is the coefficient of xi in row (j), to give the two equations 
(II) (6 + y) X2 + (—4 — 5) X3 = 8-f, 

(f) (3 — y) x 2 + (5 + |) X3 = 3 + y. 

Now 1 6 + y | > 1 3 — y | and so no interchange is needed before the next elimination. To 
eliminate X2, replace row (f) by 

(--) 

row (f) — V 3g 3 ' x row (II). 

T 

This gives 

(HI) [f + Mx^]x 3 = 7 +|ix 4. 

Collecting together and tidying up the final equations, we have 

(I) 3xi —20x2 +X 3 = 12, 

(II) 38 x 2 -13 x 3 = 12, 

(III) x 3 = 2. 

Starting with (III) and working backwards it is now a simple matter to obtain 

xi = 10, X2 = 1, X3 = 2. ◄ 


28.3.2 Gauss-Seidel iteration 

In the example considered in the previous subsection an explicit way of solving 
a set of simultaneous equations was given, the accuracy obtainable being limited 
only by the rounding errors in the calculating facilities available, and the calcula- 
tion was planned to minimise these. However, in some situations it may be that 
only an approximate solution is needed. If, for a large number of variables, this is 
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the case then an iterative method may produce a satisfactory degree of precision 
with less calculation. Such a method, known as Gauss-Seidel iteration, is based 
upon the following analysis. 

The problem is again that of finding the components of the column matrix x 
that satisfies 


Ax = b 


(28.23) 


when A and b are a given matrix and column matrix respectively. 
The steps of the Gauss-Seidel scheme are as follows. 


(i) Rearrange the equations (usually by simple division on both sides of each 
equation) so that all diagonal elements of the new matrix C are unity, i.e. 
(28.23) becomes 


Cx = d, (28.24) 

where C = I — F, and F has zeroes as its diagonal elements. 

(ii) Step (i) produces 

Fx + d = lx = x, (28.25) 

and this forms the basis of an iteration scheme 

x„ +1 = Fx„ + d, (28.26) 


where x„ is the nth approximation to the required solution vector 

(iii) To improve the convergence, the matrix F, which has zeroes on its leading 
diagonal, can be written as the sum of two matrices L and U that have 
non-zero elements only below and above the leading diagonal respectively : 


L ij = 


F,y if i > j, 

0 otherwise, 


Uy = 


F ij if i < j, 

0 otherwise. 


(28.27) 


This allows the latest values of the components of x to be used at each 
stage and an improved form of (28.26) to be obtained, 


x„+i — Lx„-|_i + Ux„ + d. (28.28) 

To see why this is possible we note, for example, that when calculating, 
say, the fourth component of x„ +i , its first three components are already 
known, and, because of the structure of L, these are the only ones needed 
to evaluate the fourth component of Lx, i+ i. 
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n 

Xl 

x 2 

x 3 

i 

2 

2 

2 

2 

4 

0.1 

1.34 

3 

12.76 

1.381 

2.323 

4 

9.008 

0.867 

1.881 

5 

10.321 

1.042 

2.039 

6 

9.902 

0.987 

1.988 

7 

10.029 

1.004 

2.004 


Table 28.6 Successive approximations to the solution of simultaneous equa- 
tions (28.29) using the Gauss-Seidel iteration method. 


► Obtain an approximate solution to the simultaneous equations 

xi T6.\'2 —4x3 = 8, 

3x! -20x 2 +x 3 = 12, (28.29) 

— xi +3x2 +5x 3 = 3. 

These are the same equations as were solved in subsection 28.3.1. 


Divide the equations by 1, —20 and 5 respectively to give 


xi + 6x2 — 4x3 = 8, 

— 0.1 5xi + X2 — 0.05x 3 = —0.6, 
— 0.2xi + 0.6x2 + X3 = 0.6. 


Thus, set out in matrix form, (28.28) is in this case 



Suppose initially (n = 1) we guess each component to have the value 2. Then the successive 
sets of values of the three quantities generated by this scheme are as shown in table 28.6. 
Even with the rather poor initial guess, a close approximation to the exact result xi = 10, 
X2 = 1, X3 = 2 is obtained in only a few iterations. ◄ 


28.3.3 Tridiagonal matrices 

Although for the solution of most matrix equations Ax = b the number of 
operations needed increases rapidly with the size N x N of the matrix (roughly as 
iV 3 ), for one particularly simple kind of matrix the computing required increases 
only linearly with N. This type often occurs in physical situations in which objects 
in an ordered set interact only with their nearest neighbours and is one in which 
only the leading diagonal and the diagonals immediately above and below it 
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contain non-zero entries. Such matrices are known as tridiagonal matrices. They 
may also be used in numerical approximations to the solutions of certain types 
of differential equation. 

A typical matrix equation involving a tridiagonal matrix is thus 


b \ c \ O' 


f x x ' 


' y i ' 

a 2 b 2 c 2 


x 2 


t 2 

a 3 b 3 c 3 


x 3 

— 

t 3 

a N- 1 b N- 1 C N- 1 

A 


X N- 1 


y N ~ i 

l ^ a N b N l 


! X N j 


l Vs , 


(28.30) 


So as to keep the entries in the matrix as free from subscripts as possible, we 
have used a, b and c to indicate subdiagonal, leading diagonal and superdiagonal 
elements respectively. As a consequence we have had to change the notation for 
the column matrix on the right-hand side from b to (say) y. 

In such an equation the first and last rows involve x\ and xjv respectively, and 
so the solution could be found by letting xi be unknown and then solving in turn 
each row of the equation in terms of xi, and finally determining xi by requiring 
the next-to-last line to generate for xn an equation compatible with that given by 
the last line. However, if the matrix is large then this becomes a very cumbersome 
operation, and a simpler method is to assume a form of solution 


Xi - 1 = 0i_lXi + (pi— l- 


(28.31) 


Since the ith line of the matrix equation is 


a-iXi- 1 + bjXj + CjXj+i = y h 


we must have, by substituting for x;_i, that 


(aA- i + bj)xi + CjX i+ 1 = y t - a* </•>,_ i. 


This is also in the form of (28.31), but with i replaced by i+ 1. Thus the recurrence 
formulae for 0,- and (p f are 


~Cj 

aA - 1 + bj ’ 


y t - a, 

+ bi 


(28.32) 


provided the denominator does not vanish for any i. From the first of the matrix 
equations it follows that 0\ = — c\/b\ and (pi = y\/b\. The equations may now 
be solved for the X; in two stages without carrying through an unknown quantity. 
First, all the 0 ; and <f>j are generated using (28.32) and the values of 0i and (pi 
and then, as a second stage, (28.31) is used to evaluate the x,-, starting with x N 
(= (f> N ) and working backwards. 
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► So/ye the following tridiagonal matrix equation, in which only non-zero elements are 
shown. 


/ 1 2 \ 


( M \ 


( 4 \ 

-1 2 1 


*2 


3 

2-12 


X3 


-3 

3 1 1 


X4 


10 

3 4 2 


*5 


7 

V -2 2 J 


V / 


V -2 / 


The solution is set out in table 28.7, in which the arrows indicate the general flow of the 
calculation. First, the columns of a,-, fc,, c,- and y,- are filled in from the original equation 
(28.33) and then the recurrence relations (28.32) are used to fill in the successive rows 
starting from the top; on each row we work from left to right as far as and including the ft 
column. Finally, the bottom entry in the the x, column is set equal to the bottom entry in the 
completed (f column and the rest of the x, column completed by using (28.31) and working 
up from the bottom. Thus the solution is xi = 2; xi = 1 ; x 3 = 3; X 4 = — 1 ; x 5 = 2; x$ = 1. ◄ 



di 

b, 

Ci 


afii- 1 + bj 

9, 

.V,' 

®i4*i— 1 

ft 

Xi 

l 

0 

1 

2 

-> 

1 

-2 

4 

0 

4 

2 

T 

i 

-1 

2 

1 

-> 

4 

-1/4 

3 

-4 

7/4 

1 

T 

i 

2 

-1 

2 

-> 

-3/2 

4/3 

-3 

ill 

13/3 

3 

T 

i 

3 

1 

1 

-> 

5 

-1/5 

10 

13 

-3/5 

-1 

T 

i 

3 

4 

2 

-> 

17/5 

-10/17 

7 

-9/5 

44/17 

2 

T 

i 

-2 

2 

0 

->■ 

54/17 

0 

-2 

-88/17 

1 ^ 

1 

T 


Table 28.7 The solution of tridiagonal matrix equation (28.33). The arrows 
indicate the general flow of the calculation, as described in the text. 


28.4 Numerical integration 

As noted at the start of this chapter, with modern computers and computer 
packages - some of which will present solutions in algebraic form, where that 
is possible - the inability to find a closed-form expression for an integral no 
longer presents a problem. But, just as for the solution of algebraic equations, it 
is extremely important that scientists and engineers should have some idea of the 
procedures on which such packages are based. In this section we discuss some of 
the more elementary methods used to evaluate integrals numerically and at the 
same time indicate the basis of more sophisticated procedures. 

The standard integral evaluation has the form 

/ = / f(x)dx, (28.34) 

J a 

where the integrand f(x) may be given in analytic or tabulated form, but for the 
cases under consideration no closed-form expression for I can be obtained. All 
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Figure 28.4 (a) Definition of nomenclature, (b) The approximation in using 

the trapezium rule; f(x) is indicated by the broken curve, (c) Simpson’s rule 
approximation; f(x) is indicated by the broken curve. The solid curve is part 
of the approximating parabola. 


numerical evaluations of I are based on regarding I as the area under the curve 
of f(x) between the limits x = a and x = b and attempting to estimate that area. 

The simplest methods of doing this involve dividing up the interval a < x < b 
into N equal sections, each of length h = (b — a)/N. The dividing points are 
labelled x,- with xo = a , xjv = b, i running from 0 to N. The point x,- is a distance 
ih from a. The central value of x in a strip (x = x,- + h/2) is denoted for brevity 
by Xj+i/ 2 , and for the same reason /(x,) is written as /,-. This nomenclature is 
indicated graphically in figure 28.4(a). 

So that we may compare later estimates of the area under the curve with the 
true value, we next obtain an exact expression for I, even though we cannot 
evaluate it. To do this we need to consider only one strip, say that between x,- 
and Xj+i. For this strip the area is, using Taylor’s expansion, 


h/2 


-h/2 


h/2 


*/2 ,7=0 


/(*<■+ 1/2 + v) dy= I ^2 / (,,) ('U+ i/ 2 )^y dy 


= E4 


(,?) 


r h/2 


n = 0 


i+1/2 

= E 4 


l-h/ 2 


V" 

} -dy 


An) 


:+1 / 2 (n + 1)! 


n + 1 


(28.35) 


It should be noticed that, in this exact expression, only the even derivatives 
of / survive the integration and all derivatives are evaluated at Xj+i/ 2 . Clearly 
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other exact expressions are possible, e.g. the integral of /(x,- + y) over the range 
0 < .V < K but we will hnd (28.35) the most useful for our purposes. 

We now turn to practical ways of approximating /, given the values of /,-, or a 
means to calculate them, for i = 0, 1, . . . , N. 


28.4.1 Trapezium rule 

In this simple case the area shown in figure 28.4(a) is approximated as shown in 
figure 28.4(h), i.e. by a trapezium. The area A,- of the trapezium is 

A i =±(f i +f i+1 )h, (28.36) 

and if such contributions from all strips are added together then the estimate of 
the total, and hence of I, is 


/(estim.) = J2 Ai= 2 {fo + 2/i + 2 fi + • • • + 2/iv-i + /+- (28.37) 

i= 0 

This provides a very simple expression for estimating integral (28.34); its accuracy 
is limited only by the extent to which h can be made very small (and hence N 
very large) without making the calculation excessively long. Clearly the estimate 
provided is only exact if / (x) is a linear function of x. 

The error made in calculating the area of the strip when the trapezium rule is 
used may be estimated as follows. The values used are /,- and f i+ 1 , as in (28.36). 
These can be expressed accurately in terms of f i+ 1/2 and its derivatives by the 
Taylor series 

f + v + 1 /MV + 1 f h YV> + 

Ji+ 1/2+1/2 - J 1+1/2 ± 2-1 i+1/2 + 21 I 2 ) i+t/2 ± 3 T l 2 / " ''+1/2 H • 

Thus 


^4, (estim.) = U(/,+/i+ 1 ), 


= h 


fi+ 1/2 + Y\ ( 2 ) ^+V 2 + 


whilst, from the first few terms of the exact result (28.35), 


zfi(exact) = hft+x /2 + 


2 (h 


3! V 2 


' (+ 1/2 


O (h 5 ). 


Thus the error A A,- = A,- (estim.) —^4, -(exact) is given by 


AA t = ( 


1 

8 


h) 


h 3 f 


n 

1+1/2 


12 


h 3 f 


1! 

1 + 1 / 2 * 


o (h 5 ) 
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The total error in /(estim.) is thus given approximately by 

A/ (estim.) « ±nh 3 (/") = ±(b - a)h 2 {f"), (28.38) 

where (/") represents an average value for the second derivative of / over the 
interval a to b. 

► Use the trapezium rule with h = 0.5 to evaluate 

I = [ (x 2 — 3x + 4) dx, 

Jo 

and, by evaluating the integral exactly, examine how well ( 28.38 ) estimates the error. 

With h = 0.5, we will need five values of f(x) = x 2 — 3x + 4 for use in formula (28.37). 
They are /( 0) = 4, /(0.5) = 2.75, /( 1) = 2, /(1.5) = 1.75 and /( 2) = 2. Putting these into 

(28.37) gives 

/(estim.) = ^(4 + 2 x 2.75 + 2 x 2 + 2 x 1.75 + 2) = 4.75. 

The exact value is 

Iy 2 l 2 

/(exact) = — — + 4x = 4|. 

.3 - Jo 

The difference between the estimate of the integral and the exact answer is 1/12. Equation 

(28.38) estimates this error as 2 x 0.25 x (/"}/12. Our (deliberately chosen!) integrand is 
one for which (/"} can be evaluated trivially. Because / (.y) is a quadratic function of x, 
its second derivative is constant, and equal to 2 in this case. Thus (/"} has value 2 and 

(28.38) estimates the error as 1/12; that the estimate is exactly right should be no surprise 
since the Taylor expansion for a quadratic polynomial about any point always terminates 
after three terms and so no higher-order terms in h have been ignored in (28.38). ◄ 

28.4.2 Simpson’s rule 

Whereas the trapezium rule makes a linear interpolation of /, Simpson’s rule 
effectively mimics the local variation of f(x ) using parabolas. The strips are 
treated two at a time (figure 28.4(c)) and therefore their number, N, should be 
made even. 

In the neighbourhood of x;, for i odd, it is supposed that f(x) can be adequately 
represented by a quadratic form, 

/ (.X; + y) = ft + ay + by 2 . (28.39) 

In particular, applying this to y = +/? yields two expressions involving b, 

fi+ i = / (x,- + h) = ft + ah + bh 2 , 
fi ~ i = f{Xi — h) = fj — ah + bh 2 ; 

thus 

bh 2 = i(/ i+ i+/ i _i-2/ ; ). 
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Now, in the representation (28.39), the area of the double strip from x,_i to 
Xj+i is given by 

r h 

/l, (estim.) = / (ft + ay + by 2 ) dy = 2 hf t + = bh 3 . 

J-h 

Substituting for bh 2 then yields for the estimated area 


,4, (estim.) = 2hfi + \ h x j(f i+l + /,_ i - 2/,) 
= 5^(4 ft + fi + 1 + fi- 1), 


an expression involving only given quantities. It should be noted that the values 
of neither b nor a need be calculated. 

For the full integral 

/(estim.) = i/i(/ 0 + f N + 4 ]T f m + 2 ]T f m ). (28.40) 

m odd m even 

It can be shown, by following the same procedure as in the trapezium rule case, 
that the error in the estimated area is approximately 

AI (estim.) « ^^/? 4 (/ ,4) ). 


28.4.3 Gaussian integration 

In the cases considered in the previous two subsections, the function / was 
mimicked by linear and quadratic functions. These yield exact answers if / 
itself is a linear or quadratic function (respectively) of x. This process could 
be continued by increasing the order of the polynomial mimicking-function so 
as to increase the accuracy with which more complicated functions / could be 
numerically integrated; but the same effect can be achieved with less effort by 
not insisting upon equally spaced points x,-. 

The detailed analysis of such methods of numerical integration, in which the 
integration points are not equally spaced and the weightings given to the values 
at each point do not fall into a few simple groups, is too long to be given here. 
The reader is referred to books devoted specifically to the theory of numerical 
analysis, where details of the integration points and weights for many schemes 
will be found. | 

We will content ourselves here with describing Gaussian integration, which 
is based upon the orthogonality properties, in the interval — 1 < x < 1, of the 
Legendre polynomials fV(x), discussed in subsection 16.6.2. In order to use these 
properties, the integral between limits a and b in (28.34) has to be changed to 


f The points and weights may be found in, e.g. Abramowitz and Stegun, Handbook of Mathematical 
Functions (Dover, 1965). 
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one between the limits —1 and +1. This is easily done with a change of variable 
from x to z given by 

2 x — b — a 


so that / becomes 

i p 1 

I = 2 a J g(z)dz, (28.41) 

in which g(z) = f(x). 

The n integration points x, for an n-point Gaussian integration are given by 
the zeroes of P n (x), i.e. the x ; are such that P„(x ; ) = 0. The integrand g(.x) is 
mimicked by the (n — l)th-degree polynomial 


C(x) = 53 

i= 1 


Pn(x) 

(.X - x,)P'(x,) 


g(Xi), 


which coincides with g(x) at each of the points x,-, i = 1,2,..., n. To see this it 
should be noted that 

lim Pni * } = 5 ik . 

(x - X,)J”(xO 

It then follows, to the extent that g(x) is well reproduced by G(x), that 
/*1 


/ 1 ^ P' n (x t ) ,/_i x - X, 


(28.42) 


The expression 


, , 1 /■' 

dx 


can be shown, using the properties of Legendre polynomials, to be equal to 

2 


Wi = 


(l-X^X,-)! 2 ’ 

and is thus the weighting to be attached to the factor g(x,) in the sum (28.42), 
which becomes 


/ i » 

g(x) dx « E Wig(Xi). 

1 i=l 


(28.43) 


In fact, because of the particular properties of Legendre polynomials, it can be 
shown that (28.43) integrates exactly any polynomial of degree up to 2 n — 1. The 
error in the approximate equality is of the order of the 2wth derivative of g and 
so, provided g(x) is a reasonably smooth function, the approximation is a good 
one. 
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As an example, for a three-point integration, the three x, are the zeroes of 
/* 3 (x) = t(5x 3 — 3.x), namely 0 and +0.774 60, and the corresponding weights are 

2 o 2 s 

=■ = | and =■ = i. 

lx(-|) 2 (1-0.6) x (f) 2 

For other forms of integrand, formulae based on other sets of orthogonal 
functions give better results. For example, integrals over finite ranges involving 
factors of the form (1 — x 2 ) ±172 in the integrand are best treated using formulae 
based on Chebyshev polynomials, whilst infinite integrals containing e~ x (0 < 
x < oo ) or e~ x ~ (— oo < x < oo) are best handled using schemes based on Laguerre 
or Hermite polynomials respectively. 


► Using a three-point formula in each case, evaluate the integral 


I = 


1 

1 + x 2 


dx, 


(i) using the trapezium rule, (ii) using Simpson’s rule, (Hi) using Gaussian integration. 
Also evaluate the integral analytically and compare the results. 


(i) Using the trapezium rule, we obtain 

I = jx \ [/( 0 ) + 2 /(i)+/(l)] 

= ?[! + § + 3 ]= °- 7750 ' 

(ii) Using Simpson’s rule, we obtain 

I = \x i [/(0) + 4/(i)+/(l)| 

= i [1 + f + 1] = 0.7833. 

(iii) Using Gaussian integration, we obtain 

_ 1 - 0 r 1 dz 

1 ~ ~2~ 7-1 l + Jfs + l) 2 

= \ 1 0.555 56 [/(— 0.77460) + /(0.774 60)] + 0.888 89/(0) | 

= \ 1 0.555 56 [0.987 458 + 0.559 503] + 0.888 89 x 0.8 j 
= 0.785 27. 

(iv) Exact evaluation gives 

/ = [ . = [tan -1 x]» = 7 = 0.78540. 

7o 1 +x 2 L Jo 4 

In practice, a compromise has to be struck between the accuracy of the result achieved 
and the calculational labour that goes into obtaining it. ◄ 

28.4.4 Monte Carlo methods 

Surprising as it may at first seem, random numbers may be used to carry out 
numerical integration. The random element comes in principally when selecting 
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the points at which the integrand is evaluated, and naturally does not extend to 
the actual values of the integrand! 

For the most part we will continue to use as our model one-dimensional 
integrals between finite limits, as typified by equation (28.34). Extensions to cover 
infinite or multidimensional integrals will be indicated briefly at the end of the 
section. It should be noted here, however, that Monte Carlo methods - the name 
has become attached to methods based on randomly generated numbers - in 
many ways come into their own when used on multidimensional integrals over 
regions with complicated boundaries. 

It goes without saying that in order to use random numbers for calculational 
purposes a supply of them must be available. There was a time when they 
were provided in book form as a two-dimensional array of random digits in 
the range 0 to 9, and the user could generate a random number of any desired 
length by selecting the positions in the table of its successive digits in any 
predetermined and systematic way. Nowadays all computers and nearly all pocket 
calculators offer a function which supplies a sequence of decimal numbers c, that, 
for all practical purposes, are randomly and uniformly chosen in the range 
0 < £, < 1. The maximum number of significant figures available in each random 
number depends on the precision of the generating device. We will defer the 
details of how these numbers are produced to a later subsection, where it will 
also be shown how random numbers distributed in a prescribed way can be 
generated. 

All integrals of the general form shown in equation (28.34) can, by a suitable 
change of variable, be brought to the form 

6=1 f(x) dx, (28.44) 

Jo 

and we will use this as our standard model. 

All approaches to integral evaluation based on random numbers proceed by 
estimating a quantity whose expectation value is equal to the sought-for value 6. 
The estimator t must be unbiased, i.e. we must have E[t] = 6, and the method 
must provide some measure of the likely error in the result. The latter will appear 
generally as the variance of the estimate, with its usual statistical interpretation, 
and not as a band in which the true answer is known to lie with certainty. 

The various approaches really differ from each other only in the degree of 
sophistication employed to keep the variance of the estimate of 6 small. The 
overall efficiency of any particular method has to take into account not only the 
variance of the estimate but also the computing and book-keeping effort required 
to achieve it. 

We do not have the space to describe even the more elementary methods in 
full detail, but the main thrust of each approach should be apparent to the reader 
from the brief descriptions that follow. 
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Crude Monte Carlo 

The most straightforward application is one in which the random numbers are 
used to pick sample points at which f(x) is evaluated. These values are then 
averaged : 

t=-Y,m). (28.45) 

n z — ' 

i = 1 


Stratified sampling 

Here the range of x is broken up into k subranges, 

0 = ao < a t < • • • < a )c = 1> 


and crude Monte Carlo evaluation is carried out in each subrange. The estimate 
E[t] is then calculated as 


k nj 


£ m = EE 

7=1 1=1 


a ' E ~ f (<Xj - 1 + fy(«y - <*m)) • 


(28.46) 


This is an unbiased estimator of 6 with variance 


(Xj CL j — | 


u = E - 

j=i “i ja -i~ i 


[f(x)] 2 dx- V - 

^ ^ 1 1 ■ 


— n j 
7=1 1 


/(.x) dx 


This variance can be made less than that for crude Monte Carlo, whilst using 
the same total number of random numbers, n = nj, if the differences between 
the average values of / (x) in the various subranges are significantly greater than 
the variations in / within each subrange. It is easier administratively to make all 
subranges equal in length but better, if it can be managed, to make them such 
that the variations in / are approximately equal in all the individual subranges. 


Importance sampling 

Although we cannot integrate /(x) analytically - we would not be using Monte 
Carlo methods if we could - if we can find another function g(x) that can be 
integrated analytically and mimics the shape of / then the variance in the estimate 
of 9 can be reduced significantly compared with that resulting from the use of 
crude Monte Carlo evaluation. 

Firstly, if necessary the function g must be renormalised, so that G(x) = 
Jo g()’)dy has the property G(l) = 1. Clearly, it also has the property G(0) = 0. 
Then, since 


6 



fix) 

g(x) 


dG(x), 


it follows that finding the expectation value of f(t])/g(i]) using a random number 
t], distributed in such a way that ^ = G(>j) is uniformly distributed on (0,1), is 
equivalent to estimating 6. This involves being able to find the inverse function 
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of G; a discussion of how to do this is given in a later subsection. If g (>7) mimics 
f(r\) well, f(r\)/g{rj ) will be nearly constant and the estimation will have a very 
small variance. Further, any error in inverting the relationship between // and £ 
will not be important since f(t])/g(rj) will be largely independent of the value 
of f/. 

As an example, consider the function /(x) = [tan _1 (x)] 1//2 , which is not analyti- 
cally integrable over the range (0, 1) but is well mimicked by the easily-integrated 
function g(x) = x'/ 2 (l — x 2 /6). The ratio of the two varies from 1.00 to 1.06 as x 
varies from 0 to 1. The integral of g over this range is 0.619 048, and so it has to 
be renormalised by the factor 1.615 38. The value of the integral of f(x) from 0 
to 1 can then be estimated by averaging the value of 

[tan _1 (f/)] 1/2 
1.615 38 )?'/ 2 (l - if/ 2 ) 

for random variables >7 which are such that G(tj) is uniformly distributed on 
(0, 1). Using batches of as few as 10 random numbers gave a value 0.630 for 9, 
with standard deviation 0.003. The corresponding result for crude Monte Carlo, 
using the same random numbers, was 0.634 + 0.065. The increase in precision is 
obvious, though the additional labour involved would not be justified for a single 
application. 

Control variates 

The control-variate method is similar to, but not the same as, importance sam- 
pling. Again, an analytically integrable function that mimics /(x) in shape has 
to be found. The function, known as the control variate, is first scaled so as to 
match / as closely as possible in magnitude and then its integral is found in 
closed form. If we denote the scaled control variate by h(x) then the estimate of 
9 is computed as 

t = / [/(x) — h(x)] dx + f h(x)dx. (28.47) 

Jo Jo 

The first integral in (28.47) is evaluated using (crude) Monte Carlo, whilst the 
second is known analytically. Although the first integral should have been ren- 
dered small by the choice of h(x), it is its variance that matters. The method relies 
on the result (see equation (26.136)) 

V[t — t'] = V[t\ + V[t'] - 2 Co v[r, t'] 

and on the fact that if t estimates 0 whilst t' estimates 9' using the same random 
numbers then the covariance of f and t' can be larger than the variance of t', and 
indeed will be so if the integrands producing 9 and 9' are highly correlated. 

To evaluate the same integral as was estimated previously using importance 
sampling, we take as h(x) the function g(x) used there, before it was renormalised. 
Again using batches of 10 random numbers, the estimated value for 9 was found 
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to be 0.629 + 0.004, a result almost identical to that obtained using importance 
sampling, in both value and precision. Since we knew already that f{x) and g(x) 
diverge monotonically by about 6% as x varies over the range (0, 1), we could 
have made a small improvement to our control variate by scaling it by 1.03 before 
using it in equation (28.47). 

Antithetic variates 

As a final example of a method that improves on crude Monte Carlo, and one that 
is particularly useful when monotonic functions are to be integrated, we mention 
the use of antithetic variates. This method relies on finding two estimates t and 
t' of 9 that are strongly anticorrelated (i.e. Covff, f'] is large and negative) and 
using the result 

V[j(t + t')\ = \ V[t] + \ V[t'] + jCov[t, t']. 

For example, the use of ^[/(£)+/(l — £)] instead of f(c) involves only twice 
as many evaluations of /, and no more random variables, but generally gives 
an improvement in precision significantly greater than this. For the integral of 
f{x) = [tan 1 (xc)] 1 / 2 , using as previously a batch of 10 random variables, an 
estimate of 0.623 + 0.018 was found. This to be compared with the crude Monte 
Carlo result, 0.634 + 0.065, obtained using the same number of random variables. 

For a fuller discussion of these methods, and of theoretical estimates of their 
efficiencies, the reader is referred to more specialist treatments. For practical 
implementation schemes, a book dedicated to scientific computing should be 
consulted. | 


Hit or miss method 

We now come to the approach that, in spirit, is closest to the activities that gave 
Monte Carlo methods their name. In this approach, one or more straightforward 
yes/no decisions are made on the basis of numbers drawn at random - the end 
result of each trial is either a hit or a miss! In this section we are concerned 
with numerical integration, but the general Monte Carlo approach, in which 
one estimates a physical quantity that is hard or impossible to calculate directly 
by simulating the physical processes that determine it, is widespread in modern 
science. For example, the calculation of the efficiencies of detector arrays in 
experiments to study elementary particle interactions are nearly always carried 
out in this way. Indeed, in a normal experiment, far more simulated interactions 
are generated in computers than ever actually occur when the experiment is 
taking real data. 

As was noted in chapter 2, the process of evaluating a one-dimensional integral 
Ja f(x)dx can be regarded as that of finding the area between the curve y — f(x) 


t e.g., Numerical Recipes, W. H. Press et al. (Cambridge University Press). 
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y = f(x ) 



x = a x = b 

Figure 28.5 A simple rectangular figure enclosing the area (shown shaded) 
which is equal to f a f(x)dx. 


and the x-axis in the range a < x < b. It may not be possible to do this 
analytically but if, as shown in figure 28.5, we can enclose the curve in a simple 
figure whose area can be found trivially then the ratio of the required area (shown 
shaded) to that of the bounding figure, c(b — a), is the same as the probability 
that a randomly selected point inside the boundary will lie below the line. 

In order to accommodate cases in which f(x) can be negative in part of the 
x-range, we treat a slightly more general case. Suppose that, for a < x < b, f(x) 
is bounded and known to lie in the range A < f(x) < B , then the transformation 

x — a 


will reduce the integral /j’ / (x) dx to the form 

A{b - a) + (B - A)(b - a) [ h(z)dz, (28.48) 

Jo 

where 

h (z) = B i _ A U U b - a ) z + a)-A], 

In this form z lies in the range 0 < z < 1 and h(z) lies in the range 0 < h(z) < 1, 
i.e. both are suitable for simulation using the standard random-number generator. 
It should be noted that for an efficient estimation the bounds A and B should 
be drawn as tightly as possible - preferably, but not necessarily, they should be 
equal to the minimum and maximum values of / in the range. The reason for 
this is that random numbers corresponding to values which /(x) cannot reach 
add nothing to the estimation but do increase its variance. 

It only remains to estimate the final integral on the RHS of equation (28.48). 
This we do by selecting pairs of random numbers and £2 and testing whether 
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/?(£ i) > £ 2 . The fraction of times that this inequality is satisfied estimates the value 
of the integral (without the scaling factors (B — A)(b — a)) since the expectation 
value of this fraction is the ratio of the area below the curve y = h(z) to the area 
of a unit square. 

To illustrate the evaluation of multiple integrals using Monte Carlo techniques, 
consider the relatively elementary problem of finding the volume of an irregular 
solid bounded by planes, say an octahedron. In order to keep the description 
brief, but at the same time illustrate the general principles involved, let us suppose 
that the octahedron has two vertices on each of the three Cartesian axes, one on 
either side of the origin for each axis. Denote those on the x-axis by xi(< 0) and 
X 2 (> 0), and similarly for the y- and z-axes. Then the whole of the octahedron 
can be enclosed by the rectangular parallelepiped 

xi < x < x 2 , yi < y < yi, u < z < z 2 . 

Any point in the octahedron lies inside or on the parallelepiped, but any point 
in the parallelepiped may or may not lie inside the octahedron. 

The equation of the plane containing the three vertex points (x„0, 0), (0, Vj, 0) 
and (0, 0,z k ) is 

— + — + — = 1 for i,j,k = 1,2, (28.49) 

X; yj z k 

and the condition that any general point (x, y, z) lies on the same side of the 
plane as the origin is that 

X V z 

- + — + 1 < 0. (28.50) 

Xi yj z k 

For the point to be inside or on the octahedron, equation (28.50) must therefore 
be satisfied for all eight of the sets of i,j and k given in (28.49). 

Thus an estimate of the volume of the octahedron can be made by generating 
random numbers £ from the usual uniform distribution and then using them in 
sets of three, according to the following scheme. 

With integer m labelling the mth set of three random numbers, calculate 

X = Xl + 6m-2(X2 — Xi), 

y = y 1 + 6m-i(y2 - yi), 

Z = Z\ + & m (z 2 - Zj). 

Define a variable n m as 1 if (28.50) is satisfied for all eight combinations of i,j,k 
values and as 0 otherwise. The volume V can then be estimated using 3 M random 
numbers from the formula 

V 1 

(x 2 - xi)(. y 2 - yi)(z 2 —z 1 ) M 
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It will be seen that, by replacing each n m in the summation by f(x,y,z)n m , this 
procedure could be extended to estimating the integral of the function / over 
the volume of the solid. The method has special value if / is too complicated to 
have analytic integrals with respect to x, y and z or if the limits of any of these 
integrals are determined by anything other than the simplest combinations of the 
other variables. If large values of / are known to be concentrated in particular 
regions of the integration volume then some form of stratified sampling should 
be used. 

It will be apparent that this general method can be extended to integrals 
of general functions, bounded but not necessarily continuous, over volumes with 
complicated bounding surfaces and, if appropriate, in more than three dimensions. 

Random number generation 

Earlier in this subsection we showed how to evaluate integrals using sequences of 
numbers that we took to be distributed uniformly on the interval 0 < £ < 1. In 
reality the sequence of numbers is not truly random, since each is generated in a 
mechanistic way from its predecessor and eventually the sequence repeats itself. 
However, the cycle is so long that in practice this is unlikely to be a problem, 
and the reproducibility of the sequence can even be turned to advantage when 
checking the accuracy of the rest of a calculational program. Much research has 
gone into the best ways to produce such ‘pseudo-random’ sequences of numbers. 
We do not have space to pursue them here and will limit ourselves to one recipe 
that works well in practice. 

Given any particular starting (integer) value xq, the following algorithm will 
generate a full cycle of m values for c,-, uniformly distributed on 0 < £,■ < 1, 
before repeats appear: 

V- 

X; = axj - 1 + c (mod m); £,• = — . 

m 

Here c is an odd integer and a has the form a = 4k + 1 with k an integer. For 
practical reasons, in computers and calculators m is taken as a (fairly high) power 
of 2, typically 32. 

The uniform distribution can be used to generate random numbers y distributed 
according to a more general probability distribution f(y) on the range a < y < b 
if the inverse of the indefinite integral of / can be found, either analytically or by 
means of a look-up table. In other words, if 

F(y)= [ f{t)dt, 

J a 

for which F(a) = 0 and F(b) = 1 then F(y) is uniformly distributed on (0, 1). This 
approach is not limited to finite a and b; a could be — oo and b could be oo. 

The procedure is thus to select a random number it from a uniform distribution 
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on (0,1) and then take as the random number y the value of F We now 
illustrate this with a worked example. 


►fine/ an explicit formula that will generate a random number y distributed on (— 00 , 00 ) 
according to the Cauchy distribution 

given a random number £ uniformly distributed on (0, 1 ). 


The first task is to determine the indefinite integral 


F(y) = 



dt 

a 2 + t 2 


1 

= — tan 
n 



Now, if v is distributed as we wish then F(y) is uniformly distributed on (0, 1). This follows 
from the fact that the derivative of F(y) is f(y). We therefore set F(y ) equal to £ and 
obtain 


S = i tan-' 2 + i 

71 a 2 


yielding 


y = atan[7t((J — f)]. 


This explicit formula shows how to change a random number drawn from a population 
uniformly distributed on (0,1) into a random number y distributed according to the 
Cauchy distribution. ◄ 


Look-up tables operate as described below for cumulative distributions F(y) 
that are non-invertible, i.e. F~ l (y) cannot be expressed in closed form. They 
are especially useful if many random numbers are needed but great sampling 
accuracy is not essential. The method for an iV-entry table can be summarised as 
follows. 

Define w m by F(w m ) = in/ N for m — 1,2 and store a table of 

y(m) = j(w m + w m - 1 ). 

As each random number y is needed, calculate k as the integral part of No, and 
take y as given by y(k). 

Normally, such a look-up table would have to be used for generating random 
numbers with a Gaussian distribution, as the cumulative integral of a Gaussian is 
non-invertible. It would be in essence table 26.3, with the roles of argument and 
value interchanged. In this particular case an alternative, based on the central 
limit theorem, can be considered. 

With generated in the usual way, i.e. uniformly distibuted on the interval 
0 < ^ < 1, the random variable 

n 

y = Y.Zi~h n (28.51) 

;= 1 

is normally distributed with mean 0 and variance n/12 when n is large. This 
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approach does produce a continuous spectrum of possible values for y, but needs 
many values of c,- for each value of y and is a very poor approximation if the 
wings of the Gaussian distribution have to be sampled accurately. For nearly all 
practical purposes a Gaussian look-up table is to be preferred. 


28.5 Finite differences 


It will have been noticed that earlier sections included several equations linking 
sequential values of /,■ and the derivatives of / evaluated at one of the x,-. In 
this section, by way of preparation for the numerical treatment of differential 
equations, we establish these relationships in a more systematic way. 

Again we consider a set of values /,■ of a function /(x) evaluated at equally 
spaced points x,-, their separation being h. As before, the basis for our discussion 
will be a Taylor series expansion, but on this occasion about the point x, : 

fi ± i = fi ± Mi + ± ^/, <3) + " ■ • (28.52) 

In this section, and subsequently, we denote the nth derivative evaluated at x* 

by fi 1] - 

From (28.52), three different expressions that approximate f\ l> can be derived. 
The first of these, obtained by subtracting the + equations, is 


f(l) = ( = fi+l — fj- 1 _ 3) 

Ji - \dx ) v 2 h 3r ' 


(28.53) 


The quantity (f i+ 1 — /,_i)/(2/i) is known as the central difference approximation 
to fj 11 and can be seen from (28.53) to be in error by approximately (h 2 /h)ff\ 
An alternative approximation, obtained from (28.52+) alone, is given by 


/•(l) _ f df\ _ fi + 1 ~ fi _ ^_A2) 

~\dx) Xi h 2l Ji 


(28.54) 


The forward difference approximation, (f i+ 1 is clearly a poorer approxima- 

tion, since it is in error by approximately (h/2)f\ 2> , as compared with (h 2 /6)f\ 3 \ 
Similarly, the backward difference (/,• — f t _\)/h obtained from (28.52—) is not as 
good as the central difference; the sign of the error is reversed in this case. 

This type of differencing approximation can be continued to the higher deriva- 
tives of / in an obvious manner. By adding the two equations (28.52+), a central 
difference approximation to ff ] can be obtained: 

(28 - 55 » 

The error in this approximation (also known as the second difference of /) is 
easily shown to be about (h 2 /\2)ff\ 

Of course, if the function /(x) is a sufficiently simple polynomial in x, all 
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derivatives beyond a particular one will vanish and there is no error in taking 
the differences to obtain the derivatives. 


*-The following is copied from the tabulation of a second-degree polynomial f(x) at values 
of x from 1 to 12 inclusive, 

2, 2, ?, 8, 14, 22, 32, 46, ?, 74, 92, 112. 

The entries marked ? were illegible and in addition one error was made in transcription. 
Complete and correct the table. Would your procedure have worked if the copying error 
had been in f{ 6)? 


Write out the entries again in row (a) below, and where possible calculate first differences 
in row (b) and second differences in row (c). Denote the j th entry in row (n) by (n)j. 

(a) 2 2 ? 8 14 22 32 46 ? 74 92 112 

(b) 0 ? ? 6 8 10 14 ? ? 18 20 

(c) ? ? ? 2 2 4 ? ? ? 2 

Because the polynomial is second-degree the second differences (c)y, which are proportional 
to d 2 f /dx 2 , should be constant, and clearly the constant should be 2. That is, (c) 6 should 
equal 2 and (b) 7 should equal 12 (not 14). Since all the (c)y = 2, we can conclude that 
(b) 2 = 2, (b) 3 = 4, (b) 8 = 14, and (b) 9 = 16. Working these changes back to row (a) shows 
that (a) 3 = 4, (a)s = 44 (not 46), and (a) 9 = 58. 

The entries therefore should read 

(a) 2, 2, 4, 8, 14, 22, 32, 44, 58, 74, 92, 112, 

where the amended entries are shown in bold type. 

It is easily verified that if the error were in /( 6) no two computable entries in row (c) 
would be equal, and it would not be clear what the correct common entry should be. 
Nevertheless, trial and error might arrive at a self-consistent scheme. ◄ 


28.6 Differential equations 

For the remaining sections of this chapter our attention will be on the solution 
of differential equations by numerical methods. Some of the general difficulties 
of applying numerical methods to differential equations will be all too apparent. 
Initially we consider only the simplest kind of equation - one of first order, 
typically represented by 

( 28 . 56 ) 

dx 

where y is taken as the dependent variable and x the independent one. If this 
equation can be solved analytically then that is the best course to adopt. But 
sometimes it is not possible to do so and a numerical approach becomes the 
only one available. In fact, most of the examples that we will use can be solved 
easily by an explicit integration, but, for the purposes of illustration, this is an 
advantage rather than the reverse since useful comparisons can then be made 
between the numerically derived solution and the exact one. 
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X 

h 

y( exact) 

0.01 

0.1 

0.5 

1.0 

1.5 

2 

3 

0 

(1) 

(1) 

(1) 

(1) 

(1) 

(1) 

(1) 

(1) 

0.5 

0.605 

0.590 

0.500 

0 

-0.500 

-1 

-2 

0.607 

1.0 

0.366 

0.349 

0.250 

0 

0.250 

1 

4 

0.368 

1.5 

0.221 

0.206 

0.125 

0 

-0.125 

-1 

-8 

0.223 

2.0 

0.134 

0.122 

0.063 

0 

0.063 

1 

16 

0.135 

2.5 

0.081 

0.072 

0.032 

0 

-0.032 

-1 

-32 

0.082 

3.0 

0.049 

0.042 

0.016 

0 

0.016 

1 

64 

0.050 


Table 28.8 The solution y of differential equation (28.57) using the Euler 
forward difference method for various values of h. The exact solution is also 
shown. 


28.6.1 Difference equations 


Consider the differential equation 


dy 

dx 


= -y. 


y(0) = l. 


(28.57) 


and the possibility of solving it numerically by approximating dy/dx by a finite 
difference along the lines indicated in section 28.5. We start with the forward 
difference 


/ dy_ \ _ U+i ~ y. 

\ dx J h 


(28.58) 


where we use the notation of section 28.5 but with / replaced by y. In this 
particular case, it leads to the recurrence relation 

, ( dy 

Ti+i = Vi + n I 

Thus, since yo = v(0) = 1 is given, v'i = y(0 + h) = y(h) can be calculated, and so 
on (this is the Euler method). Table 28.8 shows the values of y(x) obtained if this 
is done using various values of h and for selected values of x. The exact solution, 
y(x) = exp(— x), is also shown. 

It is clear that to maintain anything like a reasonable accuracy only very small 
steps h can be used. Indeed, if h is taken to be too large, not only is the accuracy 
bad but, as can be seen, for h > 1 the calculated solution oscillates (when it 
should be monotonic) and for h > 2 it diverges. Equation (28.59) is of the form 
y i+ 1 = Xy h and a necessary condition for non-divergence is |A| < 1, i.e. 0 < h < 2, 
though in no way does this ensure accuracy. 

Part of this difficulty arises from the poor approximation (28.58); its right- 
hand side is a closer approximation to dy/dx evaluated at x = x,- + h/2 than to 
dy/dx at x = x* . This is the result of using a forward difference rather than the 


= yi-hyt = (l-%i. 


(28.59) 
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X 

y(estim.) 

v(exact) 

-0.5 

(1.648) 

- 

0 

(1.000) 

(1.000) 

0.5 

0.648 

0.607 

1.0 

0.352 

0.368 

1.5 

0.296 

0.223 

2.0 

0.056 

0.135 

2.5 

0.240 

0.082 

3.0 

-0.184 

0.050 


Table 28.9 The solution of differential equation (28.57) using the Milne 
central difference method with h = 0.5 and accurate starting values. 


more accurate, but of course still approximate, central difference. A more accurate 
method based on central differences ( Milne’s method) gives the recurrence relation 


y,+i = y,_i + 2 h 



in general and, in this particular case, 


Ti+t = Ti-t - 2 hy u 


(28.60) 


(28.61) 


An additional difficulty now arises, since two initial values of y are needed. 
The second must be estimated by other means (e.g. by using a Taylor series, 
as discussed later) but for illustration purposes we will take the accurate value, 
y(—h) = exph, as the value of y-i- If h is taken as, say, 0.5 and (28.61) applied 
repeatedly then the results shown in table 28.9 are obtained. 

Although some improvement in the early values of the calculated y(x) is 
noticeable, as compared with the corresponding (It = 0.5) column of table 28.8, 
this scheme soon runs into difficulties, as is obvious from the last two rows of the 
table. 

Some part of this poor performance is not really attributable to the approxi- 
mations made in estimating dy/dx but to the form of the equation itself and 
hence of its solution. Any rounding error occurring in the evaluation effectively 
introduces into y some contamination by the solution of 


This equation has the solution y(x) = exp.x and so grows without limit; ultimately 
it will dominate the sought-for solution and thus render the calculations totally 
inaccurate. 

We have only illustrated, rather than analysed, some of the difficulties associated 
with simple finite-difference iteration schemes for first-order differential equations, 
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but they may be summarised as (i) insufficiently precise approximations to the 
derivatives and (ii) inherent instability due to rounding errors. 


28.6.2 Taylor series solutions 


Since a Taylor series expansion is exact if all its terms are included, and the limits 
of convergence are not exceeded, we may seek to use one to evaluate y \ , yi etc. 
for an equation 

T-=f(x,y), (28.62) 


when the initial value _y(x 0 ) = Vo is given. 

The Taylor series is 

h 2 h 3 

y(x + h ) = y(x) + hy'(x) + — y"(x) + — y (3) (x) + • • • . 
In the present notation, at the point x = x ; this is written 

WM -y, + hy» + + + 

But, for the required solution y{x), we know that 

„(D - ( d V 


y> =[ Tx ) =f(xi,yt 


(28.63) 


(28.64) 


(28.65) 


and the value of the second derivative at x = x,-, y = y, can be obtained from it : 


( 2 ) = df_ ej^di = df df 

' ' dx 8y dx 8x dy 


(28.66) 


This process can be continued for the third and higher derivatives, all of which 
are to be evaluated at (x;, y,). 

Having obtained expressions for the derivatives y < j n> in (28.63), two alternative 
ways of proceeding are open: 


(i) equation (28.64) is used to evaluate y i+ 1 and then the whole process is 
repeated to obtain y ;+2 and so on; 

(ii) equation (28.64) is applied several times but using a different value of h 
each time, and so the corresponding values of y{x + h ) are obtained. 

It is clear that, on the one hand, approach (i) does not require so many terms of 
(28.63) to be kept but, on the other hand, the y,(w) have to be recalculated at each 
step. With approach (ii), fairly accurate results for y may be obtained for values 
of x close to the given starting value, but for large values of h a large number 
of terms of (28.63) must be kept. As an example of approach (ii) we solve the 
following problem. 
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X 

y(estim.) 

y (exact) 

0 

1.0000 

1.0000 

0.1 

1.2346 

1.2346 

0.2 

1.5619 

1.5625 

0.3 

2.0331 

2.0408 

0.4 

2.7254 

2.7778 

0.5 

3.7500 

4.0000 


Table 28.10 The solution of differential equation (28.67) using a Taylor series. 


►Find the numerical solution of the equation 

= 2y i/2 , y(0) = 1, (28.67) 

ax 

for x = 0.1 to 0.5 in steps of 0.1. Compare it with the exact solution obtained analytically. 


Since the right-hand side of the equation does not contain x explicitly, (28.66) is greatly 
simplified and the calculation becomes a repeated application of 

(n+ i)_ d/">dy dyW 
■ V| 8y dx dy 

The necessary derivatives and their values at x = 0, where y = 1, are given below. 


f(0) = 1 1 

y' = 2 y 2 ' 2 2 

y" = (3/2)(2y 1/2 )(2y 3/2 ) = 6y 2 6 

y O) = (I2y)2y 3 '' 2 = 24y 5 / 2 24 

y (4 > = (60y 3 / 2 )2y 3 / 2 = 120y 3 120 

y(5) = (360y 2 )2y 3/2 = HOy 1 ' 2 720 


Thus the Taylor expansion of the solution about the origin (in fact a Maclaurin series) is 


, „ 6 , 24 , 120 4 720 5 

y(x) = l + 2x+ —x 2 + —x 3 + —x 4 + —x 5 + ■ 


Hence, v(estim.) = 1 + 2.x + 3.x 2 + 4.x 3 + 5.x 4 + 6.x 5 . Values calculated from this are given 
in table 28.10. Comparison with the exact values shows that using the first six terms gives 
a value that is correct to one part in 100, up to x = 0.3. ◄ 


28.6.3 Prediction and correction 

An improvement in the accuracy obtainable using difference methods is possible 
if steps are taken, sometimes retrospectively, to allow for inaccuracies in approx- 
imating derivatives by differences. We will describe only the simplest schemes of 
this kind and begin with a prediction method, usually called the Adams method. 
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The forward difference estimate of y i+ i, namely 


Ti+l — Vi + h 



= Vi + hf{xi,yi), 


(28.68) 


would give exact results if y were a linear function of x in the range x, < x < Xi+h. 
The idea behind the Adams method is to allow some relaxation of this and 
suppose that y can be adequately approximated by a parabola over the interval 
Xj_i < x < Xj+i. In the same interval dy/dx can then be approximated by a linear 
function : 


f(x,y) 


-A « a + b(x — Xj) for x,- — h < x < x,- + h. 
dx 


The values of a and b are fixed by the calculated values of / at x,_i and x,-, which 
we may denote by /,■_ i and /,•: 


Thus 


fl = fi , b = 


fi~L 


i-i 


rXi+h 


Vi+1 - Vi 


/. + (/i/ t l)( x _X.) 


dx , 


which yields 


Vi+t = Vi + hf, + \h{fi - fi—i). (28.69) 


The last term of this expression is seen to be a correction to result (28.68). That 
it is, in some sense, the second-order correction 


to a first-order formula is apparent. 

Such a procedure requires, in addition to a value for yo, a value for either yi or 
V-i, so that f\ or /_ i can be used to initiate the iteration. This has to be obtained 
by other methods, e.g. a Taylor series expansion. 

Improvements to simple difference formulae can also be obtained by using 
correction methods. Here a rough prediction of the value y i+ \ is first made and 
then this is used in a better formula, not originally usable since it in turn requires 
a value of y i+ \ for its evaluation. The value of y i+1 is then recalculated using this 
better formula. 

Such a scheme based on the forward difference formula might be as follows: 


(i) predict y>+i using Vi+i = Vi + Mu 

(ii) calculate f i+ \ using this value; 

(iii) recalculate Vi+i using y;+i = y, + h(fj + /;+ 1)/2. Here (/,■ +/ i+1 )/2 has 
replaced the /,■ used in (i), since it better represents the average value of 
dy/dx in the interval x,- < x < x,- + h. 
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Steps (ii) and (iii) can be iterated to improve further the approximation to the 
average value of dy/dx , but this will not compensate for the omission of higher- 
order derivatives in the forward difference formula. 

Many more complex schemes of prediction and correction, in most cases 
combining the two in the same process, have been devised, but the reader is 
referred to more specialist texts for discussions of them. However, because it 
offers some clear advantages, one group of methods will be set out explicitly in 
the next subsection. This is the general class of schemes known as Runge-Kutta 
methods. 


28.6.4 Runge-Kutta methods 
The Runge-Kutta method of integrating 

-/(v.y) (28.70) 

is a step-by-step process of obtaining an approximation for y i+1 by starting from 
the value of y,-. Among its advantages are that no functions other than / are used, 
no subsidiary differentiation is needed and no additional starting values need be 
calculated. 

To be set against these advantages is the fact that / is evaluated using somewhat 
complicated arguments and that this has to be done several times for each increase 
in the value of i. However, once a procedure has been established, for example 
on a computer, the method usually gives good results. 

The basis of the method is to simulate the (accurate) Taylor series for y(.x ; + /?), 
not by calculating all the higher derivatives of y at the point x,- but by taking 
a particular combination of the values of the first derivative of y evaluated at 
a number of carefully chosen points. Equation (28.70) is used to evaluate these 
derivatives. The accuracy can be made to be up to whatever power of h is desired 
but, naturally, the greater the accuracy the more complex the calculation and, in 
any case, rounding errors cannot ultimately be avoided. 

The setting up of the calculational scheme may be illustrated by considering 
the particular case in which second-order accuracy in h is required. To second 
order, the Taylor expansion is 

U+i = Li + hfi + — , (28.71) 

where 

df\ = (<H, f df\ = Sfi + f .Sfi 
dx ) Xi V dy J Xj 8x ' dy ’ 

the last step being merely the definition of an abbreviated notation. 
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We assume that this can be simulated by a form 

Ti+i = U + <*1 hfj + tt 2 hf{Xj + Pih, y t + fhjifi), (28.72) 


which in effect uses a weighted mean of the value of dy/dx at x,- and its value at 
some point yet to be determined. The object is to choose values of oq, a 2 , Pi and 
P 2 such that (28.72) coincides with (28.71) up to the coefficient of h 2 . 

Expanding the function / in the last term of (28.72) in a Taylor series of its 
own we obtain 

f(xt + PiK yi + p 2 hft) = f(Xi,yi) + Pihfj 1 + p 2 hf M 1 + 0(h 2 ). 

ox oy 

Putting this result into (28.72) and rearranging in powers of h we obtain 


yi+ 1 = yi + (ai + a 2 )hfi + a 2 h 2 




(28.73) 


Comparing this with (28.71) shows that there is in fact some freedom remaining 
in the choice of the a’s and p's. In terms of an arbitrary oei (^ 1) 


ix 2 = 1 — aj, 


Pi = Pi = 


1 

2(1 -ai)' 


One possible choice is a.\ = 0.5, giving a 2 = 0-5, Pi = Pi = E In this case the 
procedure (equation (28.72)) can be summarised by 


Ti+i = Vi + ki a i + a 2 ), (28.74) 

where 


ai = hf(xi,yi), 

a 2 = hf(xi + h , y t + ai). 


Similar schemes giving higher-order accuracy in h can be devised. Two such 
schemes, given without derivation, are 

(i) to order h 3 , 


Ti+i — Ti + g(fri + 4/?2 + b 2 ), (28.75) 

where 

bi = hf (x;, yi), 

b 2 = hf (x,- + jh, y t + jbi), 

b 2 = hf (x,- + h, y t + 2 b 2 - hi), 
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(ii) to order h A , 

37+1 = V; + g(<T + 2 c 2 + 2c 3 + C 4 ), 

where 

ci = hf (xt, y,), 
c 2 = hf(xt + \h, yt + 3 C 1 ), 
c 3 = hf(xi + \h, y t + 3 C 2 ), 
c 4 = hf(xt + h, y f + c 3 ). 


(28.76) 


28.6.5 Isoclines 

The final method to be described for first-order differential equations is not so 
much numerical as graphical, but since it is sometimes useful it is included here. 
The method, known as that of isoclines, involves sketching for a number of 
values of a parameter c those curves (the isoclines) in the xy-plane along which 
f{x,y) = c, i.e. those curves along which dy/dx is a constant of known value. It 
should be noted that they are not generally straight lines. Since a straight line 
of slope dy/dx at and through any particular point is a tangent to the curve 
y = y(.x) at that point, small elements of straight lines, with slopes appropriate 
to the isoclines they cut, effectively form the curve y = y(x). 

Figure 28.6 illustrates in outline the method as applied to the solution of 

$ = -2.XV. (28.77) 

dx 

The thinner curves (rectangular hyperbolae) are a selection of the isoclines along 
which — 2.xy is constant and equal to the corresponding value of c. The small 
cross lines on each curve show the slopes (= c) that solutions of (28.77) must 
have if they cross the curve. The thick line is the solution for which y = 1 at 
x = 0; it takes the slope dictated by the value of c on each isocline it crosses. The 
analytic solution with these properties is y(x) = exp(— x 2 ). 


28.7 Higher-order equations 

So far the discussion of numerical solutions of differential equations has been 
in terms of one dependent and one independent variable related by a first-order 
equation. It is straightforward to carry out an extension to the case of several 
dependent variables ypj governed by R first-order equations 

dy m 

—7— = J[r]{x,y[i],y[2],---,y[R]), r = 1,2 ,...,R. 

We have enclosed the label r in brackets so that there is no confusion between, 
say, the second dependent variable ypj and the value y 2 of a variable y at the 
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Figure 28.6 The isocline method. The cross lines on each isocline show the 
slopes that solutions of dy/dx = —2 xy must have at the points where they 
cross the isoclines. The heavy line is the solution with y(0) = 1, namely 
exp(— x 2 ). 


second calculational point xj. The integration of these equations by the methods 
discussed in the previous section presents no particular difficulty, provided that 
all the equations are advanced through each particular step before any of them 
is taken through the following step. 

Higher-order equations in one dependent and one independent variable can be 
reduced to a set of simultaneous equations, provided that they can be written in 
the form 

^=f( X ,y,y',...,y (R -% (28.78) 


where R is the order of the equation. To do this, a new set of variables p [ r ] is 
defined by 


d r y 

™ = r = 1 - 2 R ~ L 


(28.79) 


Equation (28.78) is then equivalent to the set of simultaneous first-order equations 

dy 


dx 
dp m 
dx 

dp[R- 1 ] 

dx 


= Pl i]» 


= P[r+ 1], r = 1,2,..., R- 2, 


(28.80) 


These can then be treated in the way indicated in the previous paragraph. The 
extension to more than one dependent variable is straightforward. 
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In practical problems it often happens that boundary conditions applicable 
to a higher-order equation consist not of the values of the function and all its 
derivatives at one particular point but of, say, the values of the function at two 
separate end-points. In these cases a solution cannot be found using an explicit 
step-by-step ‘marching’ scheme, in which the solutions at successive values of the 
independent variable are calculated using solution values previously found. Other 
methods have to be tried. 

One obvious method is to treat the problem as a ‘marching one’, but to use a 
number of (intelligently guessed) initial values for the derivatives at the starting 
point. The aim is then to find, by interpolation or some other form of iteration, 
those starting values for the derivatives that will produce the given value of the 
function at the finishing point. 

In some cases the problem can be reduced by a differencing scheme to a matrix 
equation. Such a case is that of a second-order equation for y(x) with constant 
coefficients and given values of y at the two end-points. Consider the second-order 
equation 

y"+2ky' + ny=f(x), (28.81) 

with the boundary conditions 

y(0) = A, y(l) = B. 


If (28.81) is replaced by a central difference equation, 


yi + 1 - 2 y t + yi - 1 
h 2 


+ 2 k 


U+i ~ yt - 1 
2 h 


+ M = /(*;), 


we obtain from it the recurrence relation 


(1 + kh)y i+ 1 + (f.ilr — 2)y t + (1 — kh)yt - 1 = h 2 /(x ; ). 

For h = 1/(2V — 1 ) this is in exactly the form of the N x N tridiagonal matrix 
equation (28.30), with 

b] = h,v = 1, ci = ujv = 0, 

cij = 1 — kh, bj = jih 2 — 2, c,- = 1 + kh, i = 2, 3, . . . , N — 1, 

and yi replaced by A, yN by B and y, by h 2 f (x,) for i = 2, 3, ...,1V— 1. The 
solutions can be obtained as in (28.31) and (28.32). 


28.8 Partial differential equations 

The extension of previous methods to partial differential equations, thus involving 
two or more independent variables, proceeds in a more or less obvious way. Rather 
than an interval divided into equal steps by the points at which solutions to the 
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equations are to be found, a mesh of points in two or more dimensions has to be 
set up and all the variables given an increased number of subscripts. 

Considerations of the stability, accuracy and feasibility of particular calcula- 
tional schemes in principle are the same as for the one-dimensional case, but in 
practice are too complicated to be discussed here. 

Rather than note generalities that we are unable to pursue in any quantitative 
way, we will conclude this chapter by indicating in outline how two familiar 
partial differential equations of physical science can be set up for numerical 
solution. The first of these is Laplace’s equation in two dimensions, 


d 2 (j) 8 2 (l) 

8x 2 8y 2 


(28.82) 


the value of 0 being given on the perimeter of a closed domain. 

A grid with spacings Ax and Ay in the two directions is first chosen, so that, 
for example, x,- stands for the point xo + iAx and 0y for the value 0(x;, y ; -). Next, 
using a second central difference formula, (28.82) is turned into 


0 (+ 1 J ~4*i,j + 0 i— 1 J . 4*iJ - hi 20/j T (ftij—l 

(Ax) 2 + (Ay) 2 


(28.83) 


for i = 0,1,..., A and j = 0,1 ,...,M. If (Ax) 2 = /.(Ay) 2 then this becomes the 
recurrence relationship 


4>i+i,j + <t>i-i,j + M4 > i,j+ 1 + 0y- 1) — 2(1 + X)(j>ij. (28.84) 

The boundary conditions in their simplest form (i.e. for a rectangular domain) 
mean that 


00 J, &NJ, 0i,O, 0i,M (28.85) 

have predetermined values. Non-rectangular boundaries can be accommodated, 
either by more complex boundary-value prescriptions or by using non-Cartesian 
coordinates. 

To find a set of values satisfying (28.84), an initial guess at a complete set of 
values for the 0y is made, subject to the requirement that the quantities listed 
in (28.85) have the given fixed values; those values that are not on the boundary 
are then adjusted iteratively in order to try to bring about condition (28.84) 
everywhere. Clearly one scheme is to set X = 1 and recalculate each 0,j as the 
mean of the four current values at neighbouring grid-points, using (28.84) directly, 
and then to iterate this recalculation until no value of 0 changes significantly 
after a complete cycle through all values of i and j. This procedure is the simplest 
of such ‘relaxation’ methods ; for a slightly more sophisticated scheme see exercise 
28.22 at the end of the chapter. The reader is referred to specialist books for 
fuller accounts of how this approach can be made faster and more accurate. 
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Our final example is based upon the one-dimensional diffusion equation for 
the temperature </> of a system. 


8cj) d 2 ^ 
dt ' dx 2 ' 


(28.86) 


If ( fry stands for <j)(x o + iAx, to + jAt) then a forward difference representation 
of the time derivative and a central difference representation for the spatial 
derivative lead to the following relationship: 


4*ij + 1 4*i,j f h+\j 4>i-\j 

At " K (Ax) 2 


(28.87) 


This allows the construction of an explicit scheme for generating the temperature 
distribution at later times, given that it is known at some earlier time: 


4>ij+ 1 — 1 j T (f) i -i j) T (1 2 X ) (f> i j, 


(28.88) 


where a = KAt/(Ax) 2 . 

Although this scheme is explicit it is not a good one, because of the asymmetric 
way in which the differences are formed. However, the effect of this can be 
minimised if we study and correct for the errors introduced, in the following way. 
Taylor’s series for the time variable gives 


0(j+ 1 4*i.j ~b At 


d<t>uj 

dt 


(At) 2 d 2 cf>ij 
2! dt 2 


(28.89) 


using the same notation as previously. Thus the first correction term to the 
left-hand side of (28.87) is 


At d 2 4>ij 

y st 2 ■ 


(28.90) 


The first term omitted on the right-hand side of the same equation is, by a similar 
argument, 


2(A.x) 2 d 4 </>; ; 

K — 

4! dx 4 ' 


(28.91) 


But, using the fact that $ satisfies (28.86) we obtain 


d 2 j>__ d_f (dt/A _ y 4 </> 

dt 2 dt \ dx 2 J K dx 2 \ dt J dx 4 ’ 


(28.92) 


and so, to this accuracy, the two errors (28.90) and (28.91) can be made to cancel 
if a is chosen such that 

K 2 At 2k(Ax) 2 . 1 

^ — = — — , i.e. a = -. 
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28.9 Exercises 

28.1 Use an iteration procedure to find to four significant figures the root of the 
equation 40x = exp x. 

28.2 Using the Newton-Raphson procedure find, correct to three decimal places, the 
root nearest to 7 of the equation 4.x 3 + 2x 2 — 200x — 50 = 0. 

28.3 (a) Show that if a polynomial equation g(x) = x m — f(x) = 0, where f(x) is a 

polynomial of degree less than m and for which /( 0) ^ 0, is solved using 
a rearrangement iteration scheme x n+ \ = [f(x n )] 1/m , then, in general, the 
scheme will have only first-order convergence. 

(b) By considering the cubic equation 

x 3 — ax 2 + 2 abx — (b } + ab 2 ) = 0 

for arbitrary non-zero values of a and b, demonstrate that, in special cases, 
a rearrangement scheme can give second- (or higher-) order convergence. 

28.4 The square root of a number N is to be determined by means of the iteration 
scheme 

x n +i = x„ [1 - (IV - x 2 ) f(N )] . 

Determine how to choose f(N) so that the process has second-order convergence. 

Given that yjl « 2.65, calculate yjl as accurately as a single application of the 
formula will allow. 

28.5 Solve the following set of simultaneous equations using Gaussian elimination 
(including interchange where it is formally desirable), 

Xl + 3.X2 + 4x3 + 2x 4 = 0, 

2.Xi + 10x2 — 5x3 + X4 = 6, 

4x 2 + 3x3 + 3x4 = 20, 

— 3xi + 6x2 + 12x3 — 4x4 = 16. 

28.6 The following table of values of a polynomial p(x) of low degree contains an 
error. Identify and correct the erroneous value and extend the table up to x = 1.2. 


X 

P(x) 

X 

P(x) 

0.0 

0.000 

0.5 

0.165 

0.1 

0.011 

0.6 

0.216 

0.2 

0.040 

0.7 

0.245 

0.3 

0.081 

0.8 

0.256 

0.4 

0.128 

0.9 

0.243 


28.7 Simultaneous linear equations that result in tridiagonal matrices can be treated 
sometimes as three-term recurrence relations and their solution found in a similar 
manner to that described in chapter 15. Consider the tridiagonal simultaneous 
equations 


X(_i + 4x,- + x i+ i = 3(<5 i+ i >0 - <5,-i.o), i = 0, +1, +2, . . . . 

Prove that for i > 0 the equations have a general solution of the form x,- = 
txp 1 + flq', where p and q are the roots of a certain quadratic equation. Show that 
a similar result holds for i < 0. In each case express xo in terms of the arbitrary 
constants a, /?, 

Now impose the condition that x, is bounded as i — > +oo and obtain a unique 
solution. 
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28.8 


28.9 


28.10 


28.11 


28.12 


28.13 


A possible rule for obtaining an approximation to an integral is the mid-point 
rule, given by 


/ 

J Xt 


xq+Ax 


f{x)dx = Axffxo + \ Ax) + 0(Ax 3 ). 


Writing h for Ax, and evaluating all derivates at the mid-point of the interval 
(x,x + Ax), use a Taylor series expansion to find, up to 0(/j 5 ), the coefficients of 
the higher-order errors in both the trapezium and midpoint rules. Hence find a 
linear combination of these two rules that gives 0(/i 5 ) accuracy for each step Ax. 
Given a random number p uniformly distributed on (0, 1), determine the function 
? = £()?) that would generate a random number ? distributed as 


(a) 2£ on 0< £ < 1. 

(b) on 0 < ? < 1. 

, , n n( „ 

(c) —cos— on — a < ? < a. 

4 a 2 a 

(d) |exp(— |?|) on — oo < £ < oo. 

A, B and C are three circles of unit radius with centres in the xy-plane at 
( 1, 2), (2.5, 1.5) and (2, 3) respectively. Devise a hit or miss Monte Carlo calculation 
to determine the size of the area that lies outside C but inside A and B, as well 
as inside the square centred on (2,2.5) that has sides of length 2 parallel to the 
coordinate axes. You should choose your sampling region so as to make the 
estimation as efficient as possible. Take the random number distribution to be 
uniform on (0, 1) and determine the inequalities that have to be tested using the 
random numbers chosen. 

Use a Taylor series to solve the equation 

+ xy = 0, y(0) = 1, 


evaluating y(x) for x = 0.0 to 0.5 in steps of 0.1. 

Consider the application of the predictor-corrector method described near the 
end of subsection 28.6.3 to the equation 


Show, by comparison with a Taylor series expansion, that the expression obtained 
for y i+ 1 in terms of x, and y t by applying the three steps indicated (without any 
repeat of the last two) is correct to 0(h 2 ). Using steps of h = 0.1 compute the 
value of y(0.3) and compare it with the value obtained by solving the equation 
analytically. 

A more refined form of the Adams predictor-corrector method for solving the 
first-order differential equation 


dy 

dx 


f(x,y ) 


is known as the Adams-Moulton-Bashforth scheme. At any stage (say the nth) 
in an iVth-order scheme the values of x and y at the previous N solution points 
are first used to predict the value of y„+i. This approximate value of y at the 
next solution point x„ +1 , denoted by y„ +1 , is then used together with those at the 
previous N — 1 solution points to make a more refined ( corrected ) estimation of 
y(x„ + i). The calculational procedure for a third-order scheme is summarised by 
the two equations 


y „+ 1 = y n + h{aif„ + a 2 f n -i + aifn-i) (predictor), 

y n + 1 = yn+ h(bif(x n+u y n +i) + b 2 f„ + hf„^) (corrector). 
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28.9 EXERCISES 


28.14 


28.15 


28.16 


(a) Find Taylor series expansions for /„_ i and /„_ 2 in terms of the function 
/„ = /(x„,y„) and its derivatives at x n . 

(b) Substitute them into the predictor equation and, by making that expression 
for y n+ 1 coincide with the true Taylor series for y n+ 1 up to order /? 3 , establish 
simultaneous equations that determine the values of a\,a 2 and a 3 . 

(c) Find the Taylor series for f n+1 and substitute it and that for /„_ 1 into the 
corrector equation. Make the corrected prediction for y„ + i coincide with the 
true Taylor series by choosing the weights b\,b 2 and i> 3 appropriately. 

(d) The values of the numerical solution of the differential equation 

dy 2(1+ x)y + x 3/2 
dx 2x(l+x) 

at three values of x are given in the following table: 

x 0.1 0.2 0.3 

y(x) 0.030628 0.084107 0.150 328. 

Use the above predictor-corrector scheme to find the value of y(0.4) and 
compare your answer with the accurate value, 0.225 577. 


If dy /dx = f(x,y) then show that 
drf a2r 32 


e 2 f , 8 2 f , , 2 e 2 f 


dx 2 


dx 2 ^ 8x8y 


+r 


8y 2 


dfdf , 
+ — — + f 
dx 8y 


Hence verify, by substitution and the subsequent expansion of arguments in 
Taylor series of their own, that the scheme given in (28.75) coincides with the 
Taylor expansion (28.64), i.e. 


ym = Vi + hyl (a) (b) * * * * * * * * * l) + y,- 2) + + ■ 


up to terms in /i 3 . 

To solve the ordinary differential equation 

du r, x 

-dt =f(u ' t) 

for / = /(f), the explicit two-step finite difference scheme 
u „+ 1 = ecu,, + flu n -\ + h(vf„ + Vfn-l) 


may be used. Here, in the usual notation, h is the time step, f„ = nil, u n = u(t n ) 
and /„ = f(u n ,t„); a, /?, fi, and v are constants. 

(a) A particular scheme has a. = 1, p = 0,fi = 3/2 and v = —1/2. By considering 
Taylor expansions about t = t„ for both u n+ j and f n+J , show that this scheme 
gives errors of order li 3 . 

(b) Find the values of a, /?, /(, and v that will give the greatest accuracy. 


Set up a finite difference scheme to solve the ordinary differential equation 


d 2 tj) dip 

dx 2 dx 


= 0 


in the range 1 < x < 4 and subject to the boundary conditions 0(1) = 2 and 

dtp/dx = 2 at x = 4. Using N equal increments, Ax, in x, obtain the general 

difference equation and state how the boundary conditions are incorporated 

into the scheme. Setting Ax equal to the (crude) value 1, obtain the relevant 

simultaneous equations and so obtain rough estimates for 0(2), 0(3) and 0(4). 

Finally, solve the original equation analytically and compare your numerical 

estimates with the accurate values. 
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28.17 


28.18 


28.19 


28.20 


Write a computer program that would solve, for a range of values of X, the 
differential equation 

i i- ‘ . y(0) - 1. 

dx x 2 T 2y 2 

using a third-order Runge-Kutta scheme. Consider the difficulties that might 
arise when X < 0. 

Use the isocline approach to sketch the family of curves that satisfies the non- 
linear first-order differential equation 

dy a 

d x \J x 2 + y 2 

For some problems, numerical or algebraic experimentation may suggest the 
form of the complete solution. Consider the problem of numerically integrating 
the first-order wave equation 

8u ,8u 
~dt +A dx = °’ 

in which A is a positive constant. A finite difference scheme for this partial 
differential equation is 

u(p, n + 1) — u(p, n) u(p,n) — u(p — l,n) 

At Ax 

where x = pAx and t = nAt, with p any integer and n a non-negative integer. 
The initial values are »(0,0) = 1 and u(p, 0) = 0 for p ^ 0. 


(a) Carry the difference equation forward in time for two or three steps and 
attempt to identify the pattern of solution. Establish the criterion for the 
method to be numerically stable. 

(b) Suggest a general form for u(p,n), expressing it in generator function form, 
i.e. ‘as u(p,n) is the coefficient of s p in the expansion of G(n,s)\ 

(c) Using your form of solution (or that given in the answers!), obtain an 
explicit general expression for u(p,n) and verify it by direct substitution into 
the difference equation. 

(d) An analytic solution of the original PDE indicates that an initial distur- 
bance propagates undistorted. Under what circumstances would the differ- 
ence scheme reproduce that behaviour? 


In the previous question the difference scheme for solving 

du 8u 
8t dx 

in which A has been set equal to unity, was one-sided in both space (x) and 
time (f). A more accurate procedure (known as the Lax-Wendroff scheme) is 

u(p, n + 1) — u(p, n) u{p + !,«) — u(p — 1, n) 


At 


At 

T 


2Ax 

u(p + 1, n) — 2 u(p, n) + u(p— 1, n ) 
(Ax) 2 


(a) Establish the orders of accuracy of the two finite difference approximations 
on the LHS of the equation. 

(b) Establish the accuracy with which the expression in the brackets approxi- 
mates 8 2 u/8x 2 . 

(c) Show that the RHS of the equation is such as to make the whole difference 
scheme accurate to second order in both space and time. 
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28.9 EXERCISES 


28.21 Laplace’s equation 

8 2 V d 2 V 
+ If ~ 

is to be solved for the region and boundary conditions shown in figure 28.7. 


V = 80 


40 

40 

40 

40 

40 

40 

40 




20 

20 

20 

















V = 0 


Figure 28.7 Region, boundary values and initial guessed solution for exer- 
cise 28.21. 


28.22 


Starting from the given initial guess for the potential values V and using the 
simplest possible form of relaxation, obtain a better approximation to the actual 
solution. Do not aim to be more accurate than + 0.5 units and so terminate the 
process when subsequent changes would be no greater than this. 

Consider the solution 4>(x,y) of Laplace’s equation in two dimensions using a 
relaxation method on a square grid with common spacing h. As in the main text, 
denote ^(x 0 + ih, yo + jh ) by Further, define by 

, m ,n _ 8 m+n <t> 

8x m dy n 

evaluated at (xo + ih, yo + jh). 


(a) Show that 


/ 4,0 , / 2,2 . / 0,4 

hj + ^ = 0. 


(b) Working up to terms of order h 5 , find Taylor series expansions, expressed in 
terms of the for 

5+.0 = <Pi+l,j + 0i— i,y 

So,+ = <t’i,j+ i + 0i,y — i • 

(c) Find a corresponding expansion, to the same order of accuracy, for $j±ij+i + 
4>i±lj-i an d hence show that 

S+,+ = 01+1,7+1 + 01+1,7—1 + 01—1,7+1 + 0i—l,7—l 

has the form 


+ 4 %) + ■, (<r + 6 + 4 %). 


A2.0. 


x0,2 


/l 4 


0,4, 
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28.23 


(d) Evaluate the expression 4( S+p + So,+ ) + 5+,+ and hence deduce that a possible 
relaxation scheme, good to the fifth order in h, is to recalculate each c/), ,- as 
the weighted mean of the current values of its four nearest neighbours (each 
with weight 1) and its four next-nearest neighbours (each with weight ^). 

The Schrodinger equation for a quantum mechanical particle of mass m moving 
in a one-dimensional harmonic oscillator potential V (x) = kx 2 / 2 is 


h 2 d 2 ip 
2m dx 2 ^ 


kx 2 ip 

2 


Exp. 


For physically acceptable solutions the wavefunction xp(x) must be finite at x = 0, 
tend to zero as x — > +oo and be normalised so that f \ip\ 2 dx = 1. In practice 
these constraints mean that only certain (quantised) values of E , the energy of 
the particle, are allowed. The allowed values fall into two groups, those for which 
the corresponding y(0) = 0 and those for which the corresponding y(0) ^ 0. 

Show that if the unit of length is taken as [b 2 /(mkj] l/4 and the unit of energy 
as h(k/m) i/2 then the Schrodinger equation takes the form 


_| + ( 2 £'_/ )v , = 0 . 

Devise an outline computerised scheme, using Runge-Kutta integration, that will 
enable you to: 


(a) determine the three lowest allowed values of E; 

(b) tabulate the normalised wavefunction corresponding to the lowest allowed 
energy. 

You should consider explicitly: 


(i) the variables to use in the numerical integration; 

(ii) how starting values near y = 0 are to be chosen; 

(iii) how the condition on xp as y — ► +oo is to be implemented; 

(iv) how the required values of E are to be extracted from the results of the 
integration; 

(v) how the normalisation is to be carried out. 


28.10 Hints and answers 


28.1 5.370. 

28.2 6.951 after two iterations. 

28.3 (a) £=/= 0 and /'(£) ^ 0 in general; (b) £ = b, but f'(b) = 0 whilst f(b) 0. 

28.4 f(N) = (N - 3.x 2 )- 1 = -(2N)- 1 ; 2.645 741 1, accurate value 2.645 751 3. 

28.5 Interchange is formally needed for the first two steps, though in this case no error 
will result if it is not carried out; xi = —12, x 3 = 2, x 3 = —1, x 4 = 5. 

28.6 p(0.5) = 0.175, p(1.0) = 0.200, p(l.l) = 0.121, p(1.2) = 0.000. 

28.7 The quadratic equation is z 2 + 4: + 1 = 0; a + fi — 3 = xo = a' + /?' + 3. 

With p = —2 + and q = — 2 — J3, [I must be zero for i > 0 and a' must be 
zero for i < 0; x,- = 3( — 2 + ^3)' for i > 0, x,- = 0 for i = 0, x, = — 3(— 2 — ^3)‘ 
for i < 0. 

28.8 / exact = hf + /7 3 //24; / T = hf + h 3 f /8; I M = hf . Thus 7 exact = i/ T + |/ M + 0(/i 5 ). 

28.9 Listed below are the relevant indefinite integrals F(y) of the distributions together 
with the functions £ = c(tj): 

(a) y 2 ;Z = y [r\ . 

(b) y 3/2 ; £ = q 2/3 . 
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Figure 28.8 Typical solutions y = y(x), shown by solid lines, of dy/dx = 
a(x 2 + y 2 )~ 1/2 . The short arrows give the direction that the tangent to any 
solution must have at that point. 


(c) j[sin(7ry/2a) + 1]; £ = (2a/7t)sin 1 (2?/ — 1 ). 

(d) ^exp(y) for y < 0; ^(1 — exp(— y)) for y > 0. c = ln(2r/) for 0 < tj < \ ; 
\ = — In [2( 1 — r/)] for frac 12 < r\ < 1. 

28.10 Show that the corners of the required area (with three curved sides and one 
straight one) are at (1.5, 1.5), (1.866,1.5), (2,2) and (1.669,2.056). For a pair of 
random numbers (£i,&), take x = 1.5 + and y = 1.5 + p^ 2 . The best values 
for a and /? are 0.5 and 0.556 respectively. Test the inequalities 

(«£t + 0.5) 2 + — 0.5) 2 < 1, 

(a{i - l) 2 + (H 2) 2 < 1, 

(a^t — 0.5) 2 + (f}^2 — 1-5) 2 > 1. 

If all three conditions are satisfied for n out of N pairs, the area can be estimated 
as nap /N. 

28.11 1 - x 2 /2 + x 4 /8 - x 6 /48; 1.0000, 0.9950, 0.9802, 0.9560, 0.9231, 0.8825; exact 
solution y = exp(— x 2 /2). 

28.12 y i+ 1 = y t + h(Xi + >’,) + \h 2 ( 1 + x,- + y,). The numerical estimate is 0.04923; the 
analytic solution y(x) = e x — 1 — x gives 0.049 86. 

28.13 (b) a x = 23/12, a 2 = -4/3, a 3 = 5/12. 

(c) b t = 5/12, b 2 = 2/3, b 3 = -1/12. 

(d) y(0.4) = 0.224 582, y(0.4) = 0.225 527 after correction. 

28.15 (a) The error is j^h 3 u„ + 0(/? 4 ). 

(b) a = —4, /? = 5, = 4, and v = 2 
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V = 80 


40 

41.5 

46.5 

48 

46.5 

41.5 

40 




16.5 

20 

16.5 

















V = 0 


Figure 28.9 The solution to exercise 28.21. 


28.16 


28.18 

28.19 


28.20 

28.21 

28.22 


With Xj = 1 + j Ax, N Ax = 3 and <f>j = 

[2 + (2 j + 1 )Ax] ('/>/+! — (4 + 4jAx)(j)j + [2 + ( 2 / — I )Ax] <pj i = 0 

for j = 1,2 ,..., N — 2. In addition <f> 0 = 2 and ^ + 2Ax. Analytically 

(j)(x) = 2 + 81nx. Estimated (accurate) values are 6.67 (7.55), 9.47 (10.79), 11.47 

(13.09). 

See figure 28.8. 

(a) Setting AAt = cAx gives, for example, n(0, 2) = (1 — c) 2 , u(l,2) = 2c(l — c), 
u(2, 2) = c 2 . For stability 0 < c < 1. 

(b) G(n, s) = [(1 — c) + cs]" for 0 < p < n. 

(c) [n!(l — c) n ~ p c p ]/[p\(n — p)!]. 

(d) When c = 1 and the difference equation becomes u(p, n + 1) = u(p — 1 ,n). 

(a) First order in time and second order in space; (b) O(Ax) 2 ; (c) show that 

8 2 u/8t 2 = 8 2 u/8x 2 and write the second-order correction to 8u/8t in the form 

^( At) 2 8 2 u/8x 2 . 

See figure 28.9. 

(a) Write Laplace’s equation as 0 2 ; ° +4>fj = 0 and differentiate twice with respect 
to x and y separately. Then add the resulting equations. 

(b) S +,0 = 2 <t>fj + h 2 (f> 2 'j + and corresponding results for So,+. 

(c) Note that ''/f+i , and 4>i+ij have themselves to be expanded as Taylor 

series in x, the number of terms to be retained being determined by the power 
of Ay = h that multiplies them. The roles of x and y could be reversed. 

(d) Use Laplace’s equation and result (a) to show that the given expression has 
the value 20 </>y. 
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Appendix 


Gamma, beta and error functions 


In several places in this book we have made mention of the gamma, beta and error 
functions. These convenient functions appear in a number of contexts and here 
we gather together some of their properties. This appendix should be regarded 
merely as a reference containing some useful relations with a minimum of formal 
proofs. 


Al.l The gamma function 

The gamma function T(w) is defined by 

/*00 

T(n) = / x n ~ l e~ x dx, (Al) 

Jo 

which converges for n > 0. Replacing n by n + 1 in (Al) and integrating the RHS 
by parts, we find 


POD 

T(n + 1)= / x n e~ x dx 
Jo 


= —x e 


+ 


„n— 1 „ 


: dx 


pco 

= n x n 1 e~ x dx, 

Jo 

from which we obtain the important result 


T(n + 1) = nT(n). 


(A2) 


From (Al), we see that T(l) = 1, and so, if n is a positive integer, 

T(n+l) = n!. (A3) 
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T(n) 



Figure Al.l The gamma function r(n). 


In fact, equation (A3) serves as a definition of the factorial function even for 
non-integer n. For negative n the factorial function is defined by 


(n + m ) ! 

(n + m)(n + m — 1) • • • (n + 1)’ 


(A4) 


where m is any positive integer that makes n + m > 0. Different choices of m 
(> —n) do not lead to different values for n\. A plot of the gamma function is 
given in figure Al.l, where it can be seen that the function is infinite for negative 
integer values of n, in accordance with (A4). 

By letting x = y 2 in (Al), we immediately obtain another useful representation 
of the gamma function given by 

/»00 

r(n) = 2 / y 2n ~ 1 e~ y2 dy. (A5) 

Jo 


Setting n = \ we find the result 


rCQ /* GO 

r(i) = 2 / e~ y dy = / e~ y dy = y/n, 

Jo J —oo 

where have used the standard integral discussed in section 6.4.2. From this result, 
r(n) for half-integral n can be found using (A2). Some immediately derivable 
factorial values of half integers are 


(-§)! = -2^, (-i)! = 


n. 


= (\Y = le- 


ntil 



A1.2 THE BETA FUNCTION 


It can also be shown that the gamma function is given by 

( 1 1 119 \ 

1 + l2U W " 51840^+ (A6) 

which is known as Stirling’s asymptotic series. For large n the first term dominates 
and so 

n\ « y/lnn n n e~ n ; (A7) 

this is known as Stirling’s approximation. This approximation is particularly useful 
in statistical thermodynamics, when arrangements of a large number of particles 
are to be considered. 


► Prove Stirling’s approximation n\ ~ f2nn n"e " for large n. 


From (Al), the extended definition of the factorial function (which is valid for n > —1) is 
given by 


n\= / x"e x dx = 


(A8) 


If we let x = n + y, then 


In x = In n + In ^1 + — j 


Substituting this result into (A8), we obtain 


V V 

n ( In n + - — — r H ) —n — y 

' n 2 n z 1 


nl = / exp 
J —n 

Thus, when n is sufficiently large, we may approximate n ! by 


dy. 


p n In n—n I 

/ 

J —o 


1 dy 


which is Stirling’s approximation (A7). ◄ 


A1.2 The beta function 


The beta function is defined by 


B(m,n ) = 



x m -\l-x) n - l dx. 


(A9) 


which converges for m > 0 ,n> 0. By letting x = 1 — y in (A9) it is easy to show 
that B(m,n ) = B(n,m). Other useful representations of the beta function may be 
obtained by suitable changes of variable. For example, putting x = (1 +y) 1 in 
(A9), we find that 


B(m,n) = 


f 00 y"- 1 dy 

Jo (1 +y) m+n ' 
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Alternatively, if we let x = sim 9 in (A9), we obtain immediately 


i-n/2 

B(m, n) = 2 / sin 2 '”' 1 9 cos 2 "- 1 9 dO. 

Jo 


(A10) 


The beta function may also be written in terms of the gamma function as 

r(m)r(n) 


B(m, n ) = 


T(m + n) 


(All) 


►Prore the result (All ). 


Using (A5), we have 

nCO pCO 

r(n)r(m) = 4 X ln -' e - xl dx / y 2m - 1 e - y 2 dy 
Jo Jo 

n oo 

x 2n-l y 2m-t e -tf+f) dxdy 

Changing variables to plane polar coordinates (p,4>) given by x = pcostj), y = psin^, we 
obtain 

nn/2 /»oo 

r(n)r(w) = 4 i0 2(m+n-l) e V sin 2m-l g CQS 2„-1 Q p dp d Q 

Jo Jo 

nn/2 /»oo 

= 4 sin 2 " 1-1 9 cos 2 "” 1 8 d9 p X™+r,)-i e - P 2 dp 

Jo Jo 

= B(m, n)T(m + n), 

where in the last line we have used the results (A5) and (A10). ◄ 


A1.3 The error function 

Finally we mention the error function, which is encountered in probability theory 
and in the solutions of some partial differential equations, and which is defined 
by 


erf(.x) = —j= / 
\J n Jo 


du =1 7= 


du. 


(A12) 


From this definition we can easily see that 

erf(O) = 0, erf(oo) = 1, erf(— x) — — erf(x). 

By making the substitution y = y[2u in (A12), we find 


erf(.x) = 


1 


rflx 


x/2t i . 


-F/2 


dy. 


The cumulative probability function ff>(x) for the standard Gaussian distribution 
(discussed in section 26.9.1) may be written in terms of the error function as 
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follows 


cD(x) = 


1 


N /27T 
1 
2 


V2re , 


' dy 

~ y2/2 dy 


1 


= - + erf — 


y/2j' 

It is also sometimes useful to define the complementary error function 


erfc(x) = 1 — erf(x) = — == 


e " du. 


(A13) 
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F-distribution (Fisher), 1132-1138 
critical points table, 1137 
logarithmic form, 1138 
f-test, see Student's f-test 

correlation, chi-squared test , 1143 
Cramer-Rao (Fisher’s) inequality , 1075, 1076 
Fisher's inequality , 1075, 1076 
maximum-likelihood, method of 
extended, 1112 

A, B, one-dimensional irreps, 932, 944, 951 
Abelian groups, 886 
absolute convergence of series, 127, 717 
absolute derivative, 824-826 
acceleration vector, 341 
Adams method, 1184 

Adams-Moulton-Bashforth, predictor-corrector 
scheme, 1194 

addition rule for probabilities, 967, 972 
adjoint, see Hermitian conjugate 
adjoint operators, 587-591 
adjustment of parameters, 854-855 
algebra of 

complex numbers, 88-89 

functions in a vector space, 583 

matrices, 256-257 

power series, 137 

series, 134 

tensors, 787-790 

vectors, 217-218 

in a vector space, 247 
in component form, 222 
algebraic equations, numerical methods for, see 
numerical methods for algebraic equations 
alternating group, 958 
alternating series test, 133 
ammonia molecule, symmetries of, 884 
Ampere’s rule (law), 387, 414 
amplitude modulation of radio waves, 450 
analytic (regular) functions, 712 


angle between two vectors, 225 
angular frequency, 626n 
in Fourier series, 425 
angular momentum, 782, 798 
and irreps, 935 
of particle system, 799-801 
of particles, 344 
of solid body, 402, 800 
vector representation, 241 
angular velocity, vector representation, 227, 241, 
359 

anti-Flermitian matrices, 276 
eigenvalues, 281-283 

imaginary nature, 282-283 
eigenvectors, 281-283 
orthogonality, 282-283 

anticommutativity of vector/cross product, 226 
antisymmetric functions, 422 
and Fourier series, 425^126 
and Fourier transforms, 451 
antisymmetric matrices, 275 

general properties, see anti-Flermitian 
matrices 

antisymmetric tensors, 787, 790 
antithetic variates, in Monte Carlo methods, 
1174 

aperture function, 443 
approximately equal », definition, 135 
arbitrary parameters for ODE, 475 
arc length of 

plane curves, 74-75 
space curves, 347 

arccosech, arccosh, arccoth, arcsech, arcsinh, 
arctanh, see hyperbolic functions, inverses 
Archimedean upthrust, 402, 416 
area element in 

Cartesian coordinates, 191 
plane polars, 205 
area of 
circle, 72 
ellipse, 72, 210 
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parallelogram, 227 

region, using multiple integrals, 194-196 
surfaces, 352 
as vector, 399-401, 414 
area, maximal enclosure, 838 
arg, argument of a complex number, 90 
Argand diagram, 87, 711 
argument, principle of the, 755 
arithmetic series, 120 
arithmetico-geometric series, 121 
arrays, see matrices 

associated Legendre equation, 594-595, 666, 670, 
703 

associated Legendre functions P"'(x), 666 
generating function, 594 
orthogonality, 594 
Rodrigues’ formula, 594 
associated Legendre functions P"'(x), 703 
associative law for 
addition 

in a vector space of finite dimensionality, 
247 

in a vector space of infinite dimensionality, 
583 

of complex numbers, 89 
of matrices, 256 
of vectors, 217 
convolution, 453, 464 
group operations, 885 
linear operators, 254 
multiplication 

of a matrix by a scalar, 256 
of a vector by a scalar, 218 
of complex numbers, 91 
of matrices, 258 
multiplication by a scalar 

in a vector space of finite dimensionality, 
247 

in a vector space of infinite dimensionality, 
583 

atomic orbitals, 957 
d-states, 948, 950, 956 
p-states, 948 
s-states, 986 

auto-correlation functions, 456 
automorphism, 903 
auxiliary equation, 499 
repeated roots, 499 
average value, see mean value 
axial vectors, 798 

backward differences, 1179 
basis functions 

for linear least squares estimation, 1115 
in a vector space of infinite dimensionality, 
583-584 

of a representation, 920 
change in, 926-927, 929, 934 
basis vectors, 221-222, 248-249, 778, 920 
derivatives, 814-816 


Christoffel symbol F 814 
for particular irrep, 948-950, 958 
linear dependence and independence, 221 
non-orthogonal, 250 
orthonormal, 249-250 
required properties, 221 
Bayes’ theorem, 974-975 
Bernoulli equation, 483 
Bessel correction to variance estimate, 1090 
Bessel equation, 541, 564-568, 595, 674 
Bessel functions J v (z) 
zeroes of, 662, 673 
Bessel functions J v (z), 662, 672 
generating function, 573, 595 
integral relationships, 572 
integral representation, 574-575 
orthogonality, 570-573 
recurrence relations, 569-570 
second kind, Y v (z), 568 
series, 565, 566, 595 
v = 0, 567 
v = ±1/2, 566 
spherical, 675 
Bessel inequality, 251, 586 
best unbiased estimator, 1075 
beta function, 1203 
bias, of estimator, 1074 
bilinear transformation, general, 113 
binary chopping, 1154 
binomial coefficient "Q, 27-30 
elementary properties, 26 
identities, 27 
negative n, 29 
non-integral n, 29 
binomial coefficient "Q, 977-979 
in Leibniz’ theorem, 50 
binomial distribution Biffin, p), 1010-1013 
and Gaussian distribution, 1027 
and Poisson distribution, 1016, 1019 
mean and variance, 1013 
MGF, 1012 

recurrence formula, 1011 
binomial expansion, 143 
binormal to space curves, 348 
birthdays, different, 976 
bivariate distributions, 1038-1049 
conditional, 1040 
continuous, 1039 
correlation, 1042-1049 
and independence, 1042 
matrix, 1045-1049 
positive/negative, 1042 
covariance. 1042-1049 
matrix, 1045 

expectation (mean), 1041 
independent (uncorrelated), 1039 
marginal, 1040 
variance, 1042 
Boltzmann distribution 

from constraints on total energy, 174-176 
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bonding in molecules, 945, 947-950 
Born approximation, 152, 606 
Bose-Einstein statistics, 980 
boundary conditions 
and characteristics, 633 
and Laplace equation, 699, 701 
for Green’s functions, 518, 520-522 
inhomogeneous, 521 
for ODE, 474, 476, 507 
for PDE, 614, 618-620 
for Sturm-Liouville equations, 592 
homogeneous and inhomogeneous, 618, 656, 
686, 688 

superposition solutions, 651-657 
types, 635-638 

brachistochrone problem, 843 
Bragg formula, 241 
branch cut, 722 
branch points, 721 
Bromwich integral, 765 
bulk modulus, 829 

calculus of residues, see zeroes of a function of 
a complex variable and contour integration 
calculus of variations 

constrained variation, 844-846 
estimation of ODE eigenvalues, 849 
Euler-Lagrange equation, 835-836 
Fermat’s principle, 846 
Hamilton’s principle, 847 
higher-order derivatives, 841 
several dependent variables, 841 
several independent variables, 841 
soap films, 839-840 
variable end-points, 841-844 
calculus, elementary, 42-77 
cancellation law in a group, 888 
canonical form, for second-order ODE, 522 
card drawing, see probability 
carrier frequency of radio waves, 451 
Cartesian coordinates, 221-222 
Cartesian tensors, 779-804 
algebra, 787-790 
contraction, 788 
definition, 784 
first-order, 781-784 
from scalar, 783 
general order. 784-803 
integral theorems, 803-804 
isotropic, 793-795 

physical applications, 783, 788-790, 799-803 
second-order, 784-803, 817 
symmetry and antisymmetry, 787 
tensor fields, 803 
zero-order, 781-784 
from vector, 784 
Cartesian tensors, particular 
conductivity, 801 
inertia, 800 
strain, 802 


stress, 802 
susceptibility, 801 
catenary, 840, 846 
Cauchy 

boundary conditions, 635 
distribution, 994 
inequality, 747 
integrals, 745-747 
product, 134 
root test, 132, 717 
theorem, 742 

Cauchy-Riemann relations, 713-716, 727, 729, 
743 

in terms of z and z”, 715 
central differences, 1179 
central limit theorem, 1036-1038, 1178 
central moments, see moments, central 
centre of a group, 911 
centre of mass, 198 
of hemisphere, 198 
of semicircular lamina, 200 
centroid, 198 

of plane area, 198 
of plane curve, 200 
of triangle, 220-221 
CF, see complementary function 
chain rule for functions of 
one real variable, 47-48 
several real variables, 160-161 
change of basis, see similarity transformations 
change of variables 

and coordinate systems, 161-163 
in multiple integrals, 202-210 

evaluation of Gaussian integral, 205-207 
general properties, 209-210 
in RVD, 992-999 
character tables, 935 
4 mm or C 4 „, 944, 948, 950 
S 4 or 432 or O. 956, 957 
43m or T<i, 957 

3m or 32 or C. iv or S 3 , 935, 939, 950, 952, 958, 
959 

construction of, 942-944 
characteristic equation, 285 
normal mode form, 325 
of recurrence relation, 505 
characteristic function, see moment generating 
functions (MGF) 
characteristics 

and boundary curves, 633 

multiple intersections, 633, 638 
and the existence of solutions, 632-638 
first-order equations, 632-633 
second-order equations, 636 
and equation type, 636 
characters, 934-938, 942-944 
and conjugacy classes, 934, 937 
character tables 

4 mm or C 4 „ or I> 4 , 955 
A 4 , 958 
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D 5 , 958 

quaternion, 955 
counting irreps, 937-938 
definition, 934 

of product representation, 946-947 
orthogonality properties, 936, 944 
summation rules, 939 

charge (point), Dirac ^-function respresentation, 
447 

charged particle in electromagnetic fields, 376 
Chebyshev equation, 541, 597 
polynomial solutions, 578 
Chebyshev polynomials T n (x) 
generating function, 597 
orthogonality, 597 
Rodrigues’ formula, 597 
chi-squared (x 2 ) distribution 
and likelihood-ratio test, 1125 
chi-squared (x 2 ) distribution, 1034 
and goodness of fit, 1139 
and likelihood-ratio test, 1134 
and multiple estimators, 1086 
test for correlation, 1 143 
Cholesky separation, 318 
Christoffel symbol D-, 814-816 
from metric tensor, 815, 822 
circle of convergence, 717-718 
circle, area of, 72 
circuits, electrical 
transients, 491 
Clairaut equation, 489 
classes and equivalence relations, 906 
closure of a group, 885 
closure property of eigenfunctions of an 
Hermitian operator, 601 
cofactor of a matrix element, 264 
column matrix, 255 
column vector, 255 
combinations (probability), 975-981 
common ratio in geometric series, 120 
commutation law for group elements, 886 
commutative law for 
addition 

in a vector space of finite dimensionality, 
247 

in a vector space of infinite dimensionality, 
583 

of complex numbers, 89 
of matrices, 256 
of vectors, 217 

complex scalar/dot product, 226 
convolution, 453, 464 
inner product, 249 
multiplication 

of a vector by a scalar, 218 
of complex numbers, 91 
scalar/dot product, 224 
commutator, of two matrices, 314 
comparison test, 128 
complement, 963 


probability for, 967 
complementary equation, 496 
complementary error function, 1205 
complementary function (CF), 497 
for ODE, 498 
partially known, 512 
repeated roots of auxiliary equation, 499 
completeness of 
basis vectors, 248 

eigenfunctions of an Hermitian operator, 588, 
601 

eigenvectors of a normal matrix, 280 
spherical harmonics Y"'(8, <j>), 670 
completing the square 

as a means of integration, 67-68 
for quadratic equations, 35 
to evaluate Gaussian integral, 442, 684 
complex conjugate 

z*, of complex number, 92-94, 715 
of a matrix, 261-263 
of scalar/dot product, 226 
properties of, 93 

complex exponential function, 95, 719 
complex Fourier series, 430-431 
complex integrals, 738-742, see also zeroes of a 
function of a complex variable and contour 
integration 
Cauchy’s, 745-747 
Cauchy’s theorem, 742 
definition, 739 
Jordan’s lemma, 761-762 
Morera’s theorem, 744 
of z- 1 , 740 
principal value, 760 
residue theorem, 752-754 
complex logarithms, 102-103, 720 
principal value of, 103, 720 
complex numbers, 86-117 

addition and subtraction of, 88-89 
applications to differentiation and integration, 
104 

argument of, 90 
associativity of 

addition and subtraction, 89 
multiplication, 91 
commutativity of 

addition and subtraction, 89 
multiplication, 91 

complex conjugate of, see complex conjugate 
components of, 87 

de Moivre’s theorem, see de Moivre’s theorem 
division of, 94-95, 97-98 
properties, 95 

from roots of polynomial equations, 86-87 
imaginary part of, 86-87 
modulus of, 90 

multiplication of, 91-92, 97-98 

as rotation in the Argand diagram, 91-92 
notation, 87 

polar representation of, 95-98 
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real part of, 86-87 
trigonometric representation of, 96 
complex potentials, 725-730 
and fluid flow, 727-728 
equipotentials and field lines, 726 
for circular and elliptic cylinders, 728, 730 
for parallel cylinders, 770 
for plates, 736-738, 770 
for strip, 770 
for wedges, 737 

under conformal transformations, 730-738 
complex power series, 136 
complex powers, 102-103 
complex variables, see functions of a complex 
variable and power series in a complex 
variable and complex integrals 
components 

of a complex number, 87 
of a vector, 221-222 

in a non-orthogonal basis, 238 
uniqueness, 248 

conditional (constrained) variation, 844-846 
conditional convergence, 127 
conditional distributions, 1040 
conditional probability, see probability, 
conditional 

cone 

surface area of, 75-76 
volume of, 76 
confidence interval, 1079 
confidence region, 1084 

conformal transformations (mappings), 730-738 
applications, 735-738 
examples, 732-735 
properties, 730-732 

Schwarz-Christoffel transformation, 733-735 
congruence, 907 
conic sections, 15 
eccentricity, 16 
parametric forms, 17 
standard forms, 16 
conjugacy classes, 910-912 
in a class by itself, 910 

conjugate roots of polynomial equations, 102 
connectivity of regions, 389 
conservative fields, 393-395 

necessary and sufficient conditions, 393-395 
potential (function), 395 
consistency, of estimator, 1073 
constant coefficients in ODE, 498-509 
auxiliary equation, 499 
constants of integration, 63, 474 
constrained variation, 844-846 
constraints, stationary values under, see 
Lagrange undetermined multipiers 
continuity correction for discrete RV, 1028 
continuity equation, 410 
contour integration, 758-768 
infinite integrals, 759-764 
inverse Laplace transforms, 765-768 


residue theorem, 752-754, 758-768 
sinusoidal functions, 758-759 
summing series, 764-765 
contraction of tensors, 788 
contradiction, proof by, 32-34 
contravariant 
basis vectors, 810 
derivative, 814 

components of tensor, 805-806 
definition, 810 

control variates, in Monte Carlo methods, 1173 
convergence of infinite series, 717-718 
absolute, 127, 717 
complex power series, 136 
conditional, 127 
necessary condition, 128 
power series, 135 

under various manipulations, see power 
series, manipulation 
ratio test, 718 

rearrangement of terms, 127 
tests for convergence, 128-134 
alternating series test, 133 
comparison test, 128 
grouping terms, 132 
integral test, 131 
quotient test, 130 
ratio comparison test, 130 
ratio test (D’Alembert), 129, 135 
root test (Cauchy), 132, 717 
convergence of numerical iteration schemes, 
1156-1158 
convolution 

Fourier tranforms, see Fourier transforms, 
convolution 

Laplace tranforms, see Laplace transforms, 
convolution 
convolution theorem 
Fourier transforms, 454 
Laplace transforms, 463 
Coordinate geometry, 15-18 
straight line, 15 

coordinate systems, see Cartesian, curvilinear, 
cylindrical polar, plane polar and spherical 
polar coordinates 
coordinate transformations 

and integrals, see change of variables 
and matrices, see similarity transformations 
general, 809-814 
relative tensors, 812 
tensor transformations, 811 
weight, 813 
orthogonal, 781 
coplanar vectors, 229 
correlation functions, 455-457 
auto-correlation, 456 
cross-correlation, 455 
energy spectrum, 456 
Parseval’s theorem, 457 
Wiener-Kinchin theorem, 456 
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correlation matrix, of sample, 1072 
correlation of bivariate distributions, 1042-1049 
correlation of sample data, 1072 
correspondence principle in quantum mechanics, 
1056 

cosets and congruence, 907 
cosh, hyperbolic cosine, 105, 720, see also 
hyperbolic functions 
cosine 

in terms of exponential functions, 105 
Maclaurin series for, 143 
orthogonality relations, 423 
counting irreps, see characters, counting irreps 
coupled pendulums, 335, 337 
covariance matrix, of linear least squares 
estimators, 1116 

covariance matrix, of sample, 1072 
covariance of bivariate distributions, 1042-1049 
covariance of sample data, 1072 
covariant 

basis vector, 810 
derivative, 814 

components of tensor, 805-806 
definition, 810 
derivative, 817 
of scalar, 820 
semi-colon notation, 818 
differentiation, 817-820 
CPF, see probability functions, cumulative 
Cramer determinant, 304-305 
Cramer’s rule, 304-305 
cross product, see vector product 
cross-correlation functions, 455 
crystal lattice, 151 
crystal point groups, 924 
cube roots of unity, 101 
cube, rotational symmetries of, 956 
curl of a vector field, 359 
as a determinant, 359 
as integral, 404, 406 
curl curl, 362 

in curvilinear coordinates, 374 
in cylindrical polars, 366 
in spherical polars, 368 
Stake’s theorem, 412-415 
tensor form, 823 

current-carrying wire, magnetic potential, 662 
Curvature, 53-56 
circle of, 54 
of a function, 53 
radius of, 54 

curvature of space curves, 348 
curves, see plane curves and space curves 
curvilinear coordinates, 370-375 
basis vectors, 370 
length and volume elements, 371 
scale factors, 370 
surfaces and curves, 370 
tensors, 804-826 
vector operators, 373-375 


cut plane, 762 

cycle notation for permutations, 899 
cyclic groups, 903, 940 
cyclic relation for partial derivatives, 160 
cycloid, 376, 844 

cylinders, conducting, see complex potentials, 
for circular and elliptic cylinders 
cylindrical polar coordinates, 363-367 
area element, 366 
basis vectors, 364 
Laplace equation, 661-664 
length element, 366 
vector operators, 363-367 
volume element, 366 

5-function (Dirac), see Dirac 5-function 

Sjj (dl ), Kronecker delta, tensor, see Kronecker 

delta, Sij (Sj), tensor 
D’Alembert’s ratio test, 129, 718 
in convergence of power series, 135 
D’Alembert’s solution to wave equation, 627 
damped harmonic oscillators, 243 
and Parseval’s theorem, 457 
data modelling, maximum-likelihood, 1097 
de Broglie relation, 442, 642, 703 
de Moivre’s theorem, 98, 758 
applications, 98-102 

finding the nth roots of unity, 100-101 
solving polynomial equations, 101-102 
trigonometric identities, 98-100 
deconvolution, see Fourier transforms, 
deconvolution 
defective matrices, 283, 316 
degeneracy 

breaking of, 953-955 
of normal modes, 952 
degenerate eigenvalues, 280, 287 
degenerate kernel, see kernel of integral 
equations, separable 
degree 

of polynomial equation, 2 
degree of ODE, 474 
del V, see gradient operator (grad) 
del squared V’ (Laplacian), 358, 609 
as integral, 406 

in curvilinear coordinates, 374 
in cylindrical polar coordinates, 366 
in polar coordinates, 658 
in spherical polar coordinates, 368, 675 
tensor form, 822 

delta function (Dirac), see Dirac 5-function 
dependent random variables, 1038-1047 
derivative, see also differentiation 
absolute, 824-826 
covariant, 817 
Fourier transform of, 450 
Laplace transform of, 461 
normal, 356 

of a function of a complex variable, 711 
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of a vector, 340 

of basis vectors, 342-343 

of composite vector expressions, 343-344 

of function of a function, 47-48 

of hyperbolic functions, 109-112 

of products, 45^17, 49-51 

of quotients, 48 

of simple functions, 45 

ordinary, first, second and nth, 43 — 44 

partial, see partial differentiation 

total, 157 

derivative method for second series solution of 
ODE, 551-554 
determinant form 
and eyj, 791 
for curl, 359 
determinants, 264-268 

adding rows or columns, 267 
and singular matrices, 268 
as product of eigenvalues, 292 
evaluation 
using e ijk , 791 

using Laplace expansion, 264-265 
identical rows or columns, 267 
in terms of cofactors, 264-265 
interchanging two rows or two columns, 267 
Jacobian representation, 204, 208, 209 
notation, 264 

of Hermitian conjugate matrices, 267 
of order three, in components, 265 
of transpose matrices, 266 
product rule, 267 
properties, 266-268, 827 
relationship with rank, 272-273 
removing factors, 267 
secular, 285 
diagonal matrices, 273 
diagonalisation of matrices, 290-293 
normal matrices, 291-292 
properties of eigenvalues, 292-293 
simultaneous, 337-338 
diamond, unit cell, 239 
die throwing, see probability 
difference method for summation of series, 
122-123 

difference schemes for differential equations, 
1180-1183, 1190-1192 
difference, finite, see finite differences 
differentiable 

function of a complex variable, 711-713 
function of a real variable, 43 
differential 
definition, 44 

exact and inexact, 158-159 
of vectors, 344, 350 
total, 157 

differential equations, see ordinary differential 
equations and partial differential equations 
differential equations, particular 
Bernoulli, 483 


Bessel, 541, 564-568 
Chebyshev, 541 
Clairaut, 489 

diffusion, 611, 628-631, 649, 656-657, 1192 
Euler, 510 

Euler-Lagrange, 835-836 
Helmholtz, 671-676 
Hermite, 541 
Lagrange, 848 
Laguerre, 541 

Laplace, 612, 623, 650, 651. 1191 
Legendre, 540, 541, 555-564 
Legendre linear, 509-511 
Poisson, 612, 678-681 
Schrodinger, 675, 703 
Schrodinger, 612, 854 
simple harmonic oscillator, 541 
Sturm-Liouville, 849 

wave, 609-610, 622, 626-628, 647, 671, 849 
differential operators, see linear differential 
operator 

differentiation, see also derivative 
as gradient, 43 
as rate of change, 42 
chain rule, 47—48 
covariant, 817-820 
from first principles, 42 — 45 
implicit, 48-49 
logarithmic, 49 
notation, 44 
of Fourier series, 430 
of integrals, 181-182 
of power series, 138 
partial, see partial differentiation 
product rule, 45-47, 49-51 
quotient rule, 48 
theorems, 56-58 
using complex numbers, 104 
diffraction, see Fraunhofer diffraction 
diffusion equation, 611, 621, 628-631 
combination of variables, 629-631 
integral transforms, 681-683 
numerical methods, 1192 
separation of variables, 649 
simple solution, 629 
superposition, 656-657 
diffusion of solute, 611, 629, 681-683 
dihedral group, 955, 958 
dimension of irrep, 930 
dimensionality of vector space, 248 
dipole matrix elements, 211, 950, 957 
dipole moments of molecules, 919-920 
Dirac (5-function, 361, 411, 445^149 
and convolution, 453 
and Green’s functions, 517, 518 
as limit of various distributions, 449 
as sum of harmonic waves, 448 
definition, 445 
Fourier transform of, 449 
impulses, 447 
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point charges, 447 
properties, 445 
reality of, 449 

relation to Fourier transforms, 448-449 
relation to Heaviside (unit step) function, 447 
three-dimensional, 447, 458 
direct product, of groups, 914 
direct sum ®, 928 
direction cosines, 225 
Dirichlet boundary conditions, 635, 745n 
Green’s functions, 688, 690-699 
method of images, 693-699 
Dirichlet conditions, for Fourier series, 421^122 
disc, moment of inertia, 211 
discontinuous functions and Fourier series, 
426^128 

discrete Fourier transforms, 468 
disjoint events, see mutually exclusive events 
displacement kernel, see kernel of integral 
equations, displacement 
distance from a 

line to a line, 235-236 
line to a plane, 236-237 
point to a line, 233-234 
point to a plane, 234-235 
distributive law for 

addition of matrix products, 259 
convolution, 453, 464 
inner product, 249 
linear operators, 254 
multiplication 

of a matrix by a scalar, 256 
of a vector by a complex scalar, 226 
of a vector by a scalar, 218 
multiplication by a scalar 

in a vector space of finite dimensionality, 
247 

in a vector space of infinite dimensionality, 
583 

scalar/dot product, 224 
vector/cross product, 226 
div, see divergence of vector fields 
divergence of vector fields, 358 
as integral, 404-405 
in curvilinear coordinates, 373 
in cylindrical polars, 366 
in spherical polars, 368 
tensor form, 821-822 
divergence theorem 
for tensors, 803 
for vectors, 407^108 
physical applications, 410-411 
related theorems, 409 
division axiom in a group, 888 
division of complex numbers, 94-95 
dot product, see scalar product 
double integrals, see multiple integrals 
drumskin, see membrane 
dual tensors, 798-799 
dummy variable, 62 


e,a , Levi-Civita symbol, tensor, 790-795 
and determinant, 791 
identities, 792-793 
isotropic, 794 
vector products, 791 
weight, 813 

e x , see exponential function 
E, two-dimensional irrep, 932, 944, 950 
eccentricity, of conic sections, 16 
efficiency, of estimator, 1074 
eigenequation for differential operators, 581 
more general form, 582-583, 601-602 
eigenfrequencies, 325 

estimation using Rayleigh-Ritz method, 
333-335 
eigenfunctions 

completeness for an Hermitian operator, 588 
construction of a real set for an Hermitian 
operator, 590-591 
definition, 582 
of integral equations, 876 
of Legendre equation, 582 
of simple harmonic oscillators, 582 
orthogonality for Hermitian operators, 
589-590 

eigenvalues, 277-287 

characteristic equation, 285 
definition, 277 
degenerate, 287 
determination, 285-287 
estimation for ODE, 849 
estimation using Rayleigh-Ritz method, 
333-335 
notation, 278 

of a general square matrix, 283 
of a representative matrix, 942 
of a unitary matrix, 283 
of an Hermitian operator 
reality, 588-589 

of anti-Hermitian matrices, see 
anti-Hermitian matrices 
of Fredholm equations, 867 
of Hermitian matrices, see Hermitian matrices 
of integral equations, 867, 875 
of linear differential operators 

adjustment of parameters, 854-855 
definition, 582 
error in estimate of, 852 
estimation, 849-855 
higher eigenvalues, 852, 859 
Legendre equation, 582 
simple harmonic oscillator, 582 
of linear operators, 277 
of normal matrices, 278-281 
under similarity transformation, 292-293 
eigenvectors, 277-287 

characteristic equation, 285 
definition, 277 
determination, 285-287 
normalisation condition, 278 
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notation, 278 

of a general square matrix, 283 
of a unitary matrix, 283 
of anti-Hermitian matrices, see 
anti-Hermitian matrices 
of commuting matrices, 283 
of Hermitian matrices, see Hermitian matrices 
of linear operators, 277 
of normal matrices, 278-281 
stationary properties for quadratic/Hermitian 
forms, 295-296 

Einstein relation, 442, 642, 703 
elastic deformations, 802-803 
electromagnetic fields 
flux, 401 

Maxwell's equations, 379, 414, 828 
electrostatic fields and potentials 
charged split sphere, 668 
conducting cylinder in uniform field, 730 
conducting sphere in uniform field, 667 
from charge density, 680, 692 
from complex potential, 727 
infinite charged plate, 694, 736 
infinite wedge with line charge, 738 
ininite charged wedge, 736 
of line charges, 696, 726 
semi-infinite charged plate, 736 
sphere with point charge, 698 
ellipse 

area of, 72, 210, 391 
as section of quadratic surface, 297 
ellipsoid, volume of, 210 
elliptic PDE, 620, 623 
empty event 0, 963 
end-points for variations 
contributions from, 841 
fixed, 836 
variable, 841-844 
energy levels of 

particle in a box, 703 
simple harmonic oscillator, 604 
energy spectrum and Fourier transforms, 456, 
457 

envelopes, 176-178 
equations of, 177 
to a family of curves, 176 
epimorphism, 903 

equilateral triangle, symmetries of, 889, 894, 
923-924, 952 

equivalence relations, 906-908, 910 
and classes, 906 
congruence, 907-909 
examples, 912 

equivalence transformations, see similarity 
transformations 

equivalent representations, 926-928, 941 
error function, erf, 630, 682, 1204 
error terms 

in Fourier series, 436^137 
in Taylor series, 142-143 


errors, first and second kind, 1 122 
essential singularity, 724, 750 
estimation of eigenvalues 

linear differential operator, 851-854 
Rayleigh-Ritz method, 333-335 
estimator, maximum-likelihood, 1098 
estimators (statistics), 1072 
best unbiased, 1075 
bias, 1074 

central confidence interval. 1080 
confidence interval, 1079 
confidence limits, 1079 
confidence region, 1084 
consistency, 1073 
efficiency, 1074 
minimum-variance, 1075 
standard error, 1077 
Euler equation 
differential, 510, 528 
trigonometric, 96 
Euler method, numerical, 1181 
Euler-Lagrange equation, 835-836 
special cases, 836-840 
even functions, see symmetric functions 
events, 962 

complement of, 963 
empty 0, 963 
intersection of n, 962 
mutually exclusive, 971 
statistically independent, 971 
union of U, 963 
exact differentials, 158-159 
exact equations, 478, 511-512 
condition for, 478 
non-linear, 525 

expectation values, see probability distributions, 
mean 

exponential distribution, 1032-1033 
from Poisson, 1032 
MGF, 1033 
exponential function 

Maclaurin series for, 143 
of a complex variable, 95, 719 
relation with hyperbolic functions, 105 

Fabry-Perot interferometer, 149 
factorial function, general, 1202 
factorisation, of a polynomial equation, 7 
faithful representation, 925, 940 
Fermat’s principle, 846, 857 
Fermi-Dirac statistics, 980 
Fibonacci series, 531 
field lines and complex potentials, 726 
fields 

conservative, 393-395 
scalar, 353 
tensor, 803 
vector, 353 

fields, electrostatic, see electrostatic fields and 
potentials 
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fields, gravitational, see gravitational fields and 
potentials 

finite differences, 1179-1180 
central, 1179 

for differential equations, 1180-1183 
forward and backward, 1179 
from Taylor series, 1179, 1186 
schemes for differential equations, 1190-1192 
finite groups, 885 
first law of thermodynamics, 179 
first-order differential equations, see ordinary 
differential equations 

Fisher distribution, see F-distribution (Fisher) 
fluids 

Archimedean upthrust, 402, 416 
complex velocity potential, 727 
continuity equation, 410 
cylinder in uniform flow, 728 
flow, 727-728 
flux, 401, 729 
irrotational flow, 359 
sources and sinks, 410-411, 727 
stagnation points, 727 
velocity potential, 415, 612 
vortex flow, 414, 728 
forward differences, 1179 
Fourier cosine transforms, 452 
Fourier series, 421 — 438 
and separation of variables, 652-655, 657 
coefficients, 423^125, 431 
complex, 430-431 
differentiation, 430 
Dirichlet conditions, 421^122 
discontinuous functions, 426^128 
error term, 436^137 
examples 

square-wave, 424-425 
x, 430, 431 
,x 2 , 428^129 
x 3 , 430 

integration, 430 

non-periodic functions, 428-430 
orthogonality of terms, 423 
complex case, 431 
Parseval’s theorem, 432-433 
raison d'etre, 421 
standard form, 423 
summation of series, 433 
symmetry considerations, 425-426 
Fourier sine transforms, 451 
Fourier transforms, 439^159 

as generalisation of Fourier series, 439 — 441 
convolution, 452 — 455 
and the Dirac (5-function, 453 
associativity, commutativity, distributivity, 
453 

definition, 453 
resolution function, 452 
convolution theorem, 454 
correlation functions, 455^157 


cosine transforms, 452 
deconvolution, 455 
definition, 441 
discrete, 468 

evaluation using convolution theorem, 454 
for integral equations, 868-871 
for PDE, 683-686 

Fourier-related (conjugate) variables, 442 
in higher dimensions, 457 — 459 
inverse, definition, 441 
odd and even functions, 451 
Parseval's theorem, 456^157 
properties: differentiation, exponential 
multiplication, integration, scaling, 
translation, 450 

relation to Dirac S -function, 448-449 
sine transforms, 451 
Fourier transforms, examples 
convolution, 454 
damped harmonic oscillator, 457 
Dirac 5-function, 449 
exponential decay function, 441 
Gaussian (normal) distribution, 441 
rectangular distribution, 448 
spherically symmetric functions, 458 
two narrow slits, 454 
two wide slits, 444, 454 
Fourier’s inversion theorem, 441 
Fraunhofer diffraction, 443^145 
diffraction grating, 467 
two narrow slits, 454 
two wide slits, 444, 454 
Fredholm integral equations, 864 
eigenvalues, 867 
operator form, 865 
with separable kernel, 866-867 
Fredholm theory, 874-875 
Frenet-Serret formulae, 349 
Frobenius series, 545 
Fuch's theorem, 544 
function of a matrix, 260 
functional, 835 

functions of a complex variable, 711-725, 
747-752 
analyticity, 712 
behaviour at infinity, 725 
branch points, 721 
Cauchy’s integrals, 745-747 
Cauchy-Riemann relations, 713-716 
conformal transformations, 730-738 
derivative, 711 
differentiation, 711-716 
identity theorem, 748 
Laplace equation, 715, 725 
Laurent expansion, 749-752 
multivalued and branch cuts, 721-723, 766 
particular functions, 718-721 
poles, 724 

power series, 716-718 

real and imaginary parts, 711, 716 
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singularities, 712, 723-725 
Taylor expansion, 747-748 
zeroes, 725, 754-758 
functions of one real variable 

decomposition into even and odd functions, 
422 

differentiation of, 42-51 
Fourier series, see Fourier series 
integration of, 60-73 
limits, see limits 
maxima and minima of, 51-53 
stationary values of, 51-53 
Taylor series, see Taylor series 
functions of several real variables 
chain rule, 160-161 
differentiation of, 154-182 
integration of, see multiple integrals, 
evaluation 

maxima and minima, 165-170 
points of inflection, 165-170 
rates of change, 156-158 
saddle points, 165-170 
stationary values, 165-170 
Taylor series, 163-165 
fundamental solution, 691-693 
fundamental theorem of 
algebra, 86, 88, 770 
calculus, 62-63 

complex numbers, see de Moivre’s theorem 
gamma function 

as general factorial function, 1202 
definition and properties, 1201 
Gauss's theorem, 700 
Gauss-Seidel iteration, 1160-1162 
Gaussian (normal) distribution N(fi,a 2 ), 
1021-1031 

and Binomial distribution, 1027 
and central limit theorem, 1037 
and Poisson distribution, 1029-1030 
continuity correction, 1028 
CPF, 1023 
tabulation, 1024 
Fourier transform, 441 
integration with infinite limits, 205 
integration with infinite limits, 207 
mean and variance, 1022-1026 
MGF, 1027, 1030 
multiple, 1030-1031 
multivariate, 1051 
sigma limits, 1025 
standard variable, 1022 
Gaussian (normal) distribution N(fi,a 2 ), 
cumulative probability function, 1178 
random number generation, 1178 
Gaussian elimination with interchange, 
1159-1160 

Gaussian integration, 1168-1170 
general tensors 
algebra, 787-790 


contraction, 788 
contravariant, 810 
covariant, 810 
dual, 798-799 
metric, 806-809 

physical applications, 806-809, 825-826 
pseudotensors, 813 
tensor densities, 813 
generalised likelihood ratio, 1124 
generating functions 

associated Legendre polynomials, 594 
Bessel functions, 573, 595 
Chebyshev polynomials, 597 
Flermite polynomials, 578, 596 
Laguerre polynomials, 597 
Legendre polynomials, 562-564, 594 
generating functions, probability, 999-1009, see 
also moment generating functions and 
probability generating functions 
geodesics, 825-826, 831, 856 
geometric distribution, 1001 
geometric series, 120 
Gibbs’ free energy, 181 
Gibbs’ phenonmenon, 427 
gradient of a function of 
one variable, 43 
several real variables, 156-158 
gradient of scalar, 354-358 
tensor form, 821 
gradient of vector, 785, 818 
gradient operator (grad), 354 
as integral, 404 
in curvilinear coordinates, 373 
in cylindrical polars, 366 
in spherical polars, 368 
tensor form, 821 

Gram-Schmidt orthogonalisation of 
eigenfunctions of Hermitian operators, 
589-590 
eigenvectors of 

Flermitian matrices, 282 
normal matrices, 280 
functions in a Hilbert space, 584-586 
gravitational fields and potentials 
Laplace equation, 612 
Newton’s law, 345 
Poisson equation, 612, 678 
uniform disc, 706 
uniform ring, 676 

Green’s functions, 597-601, 686-702 
and boundary conditions, 518, 520 
and Dirac ^-function, 517 
and partial differential operators, 687 
and Wronskian, 533 
diffusion equation, 684 
Dirichlet problems, 690-699 
for ODE, 188, 517-522 
Neumann problems, 700-702 
particular integrals from, 520 
Poisson’s equation, 689 
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Green’s theorems 

applications, 639, 689, 743 
in a plane, 390-393, 413 
in three dimensions, 408 
ground-state energy 
harmonic oscillator, 855 
hydrogen atom, 860 
group multiplication tables, 892 
order five, 904 
order four, 892, 894, 903 
order six, 897, 903 
order three, 904 

grouping terms as a test for convergence, 132 
groups 

Abelian, 886 
associative law, 885 
cancellation law, 888 
centre, 911 
closure, 885 
cyclic, 903 
definition, 885-888 
direct product, 914 
division axiom, 888 
elements, 885 
order, 889 
finite, 885 

identity element, 885-888 
inverse, 885, 888 
isomorphic, 893 
mappings between, 901-903 
homomorphic, 901-903 
image, 901 
isomorphic, 901 
nomenclature, 944-945 
non-Abelian, 894-898 
order, 885, 923, 924, 936, 939, 942 
permutation law, 889 
subgroups, see subgroups 
groups, examples 

+1 under multiplication, 885 
alternating, 958 
complex numbers e d \ 890 
functions, 897 
general linear, 914 
integers under addition, 885 
integers under multiplication (mod N ), 
891-893 
matrices, 896 
permutations, 898-900 
quaternion, 915 
rotation matrices, 890 
symmetries of an equilateral triangle, 889 

H„(x), see Hermite polynomials 
Hamilton's principle, 847 
Hamiltonian, 855 
Hankel transforms, 465 
harmonic oscillators 
damped, 243, 457 
ground-state energy, 855 


Schrodinger equation, 855 
simple, see simple harmonic oscillator 
heat flow 

diffusion equation, 611, 629, 656 
in bar, 656-657, 683, 705 
in thin sheet, 631 
Heaviside function, 447 

relation to Dirac (5-function, 447 
Heisenberg’s uncertainty principle, 441^443 
Helmholtz equation, 671-676 
cylindrical polars, 673 
plane polars, 672-673 
spherical polars, 674-676 
Helmholtz potential, 180 
hemisphere, centre of mass and centroid. 198 
Hermite equation, 541, 593, 596 
Hermite polynomials H n (x) 
generating function, 578, 596 
orthogonality, 596 
Rodrigues’ formula, 596 
Hermitian conjugate, 261-263 
and inner product, 263 
product rule, 262 
Hermitian forms, 293-297 

positive definite and semi-definite, 295 
stationary properties of eigenvectors, 295-296 
Hermitian kernel, see kernel of integral 
equations, Hermitian 
Hermitian matrices, 276 
eigenvalues, 281-283 
reality, 281-282 
eigenvectors, 281-283 
orthogonality, 282 
Hermitian operators, 587-591 
boundary condition for simple harmonic 
oscillators, 587-588 
eigenfunctions 
completeness, 588 
orthogonality, 589-590 
eigenvalues 
reality, 588-589 
Green’s functions, 597-601 
importance of, 583, 588 
in Sturm-Liouville equations, 591-592 
properties, 588-591 
superposition methods, 597-601 
higher-order differential equations, see ordinary 
differential equations 
Hilbert spaces, 584-586 
hit or miss, in Monte Carlo methods, 1174 
homogeneous 

boundary conditions, see boundary 
conditions, homogeneous and 
inhomogeneous 
differential equations, 496 
dimensionally, 481, 527-528 
simultaneous linear equations, 298 
homomorphism, 901-903 
kernel of, 902 
representation as, 925 
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Hooke's law, 802 
hydrogen atom, 604 
s-states, 986 

electron wavefunction, 211 
ground-state energy, 860 
hydrogen molecule, symmetries of, 883 
hyperbola, as section of quadratic surface, 297 
hyperbolic functions, 105-112, 720 
calculus of, 109-112 
definitions, 105, 720 
graphs, 105 
identities, 107 
in equations, 108 
inverses, 108-109 
graphs, 109 

trigonometric analogies, 105-107 
hyperbolic PDE, 620, 623 
hypergeometric distribution, 1015-1016 
mean and variance, 1015 
hypergeometric equation, 603 
hypothesis testing, 1119-1140 

errors, first and second kind, 1 122 

generalised likelihood ratio, 1124 

generalised likelihood ratio test, 1123 

goodness of fit, 1138 

Neyman-Pearson test, 1122 

null, 1120 

power, 1122 

rejection region, 1121 

simple or composite, 1 120 

statistical tests, 1120 

test statistic, 1120 

i, j, k (unit vectors), 223 
i, square root of —1, 87 
identity element of a group, 885-888 
uniqueness, 885, 887 
identity matrices, 259, 260 
identity operator, 254 
images, method of, see method of images 
imaginary part /term of a complex number. 
86-87 

importance sampling, in Monte Carlo methods, 
1172 
improper 
integrals, 71 
rotations, 795-797 

impulses, S -function respresentation, 447 
independent random variables, 998, 1042 
index of a subgroup, 908 
indices, of regular singular points, 546 
indicial equation, 545 

distinct roots with non-integral difference, 
546-547 

repeated roots, 547, 551, 553 
roots differ by integer, 548-549, 552 
induction, proof by, 31-32 
inequalities 

amongst integrals, 73 
Bessel, 251, 586 


Schwarz, 251, 586 
triangle, 251, 586 

inertia, see also moments of inertia 
moments and products, 800 
tensor, 800 

inexact differentials, 158-159 
inexact equation, 479 
infinite integrals, 71 

contour integration, 759-764 
infinite series, see series 
inflection 

general points of, 53 
stationary points of, 51-53 
inhomogeneous 

boundary conditions, see boundary 
conditions, homogeneous and 
inhomogeneous 
differential equations, 496 
simultaneous linear equations, 298 
inner product in a vector space, see also scalar 
product 

of finite dimensionality, 249-250 
and Hermitian conjugate, 263 
commutativity, 249 
distributivity over addition, 249 
of infinite dimensionality, 584 
integral equations 
eigenfunctions, 876 
eigenvalues, 867, 875 
Fredholm, 864 

from differential equations, 862-863 
homogeneous, 864 
linear, 863 
first kind, 864 
second kind, 864 
nomenclature, 863 
singular, 864 
Volterra, 864 

integral equations, methods for 
differentiation, 871 
Fredholm theory, 874-875 
integral transforms, 868-871 
Fourier, 868-871 
Laplace, 869 

Neumann series, 872-874 
Schmidt-Hilbert theory, 875-878 
separable (degenerate) kernels, 866-867 
integral test for convergence of series, 131 
integral transforms, see also Fourier transforms 
and Laplace transforms 
general form, 465 
Hankel transforms, 465 
Mellin transforms, 465 
integrals, see also integration 
complex, see complex integrals 
definite, 60 

double, see multiple integrals 
Fourier transform of, 450 
improper, 71 
indefinite, 63 
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inequalities, 73, 586 
infinite, 71 

Laplace transform of, 462 
limits 

containing variables, 191 
fixed, 60 
variable, 62 
line, see line integrals 
multiple, see multiple integrals 
non-zero, 946-947 
properties, 61 

triple, see multiple integrals 
undefined, 60 

integrals of vectors, see vectors, calculus of, 
integration 
integrand, 60 

integrating factor (IF), 512 
first-order ODE, 479-481 
integration, see also integrals 
applications, 73-77 
finding the length of a curve, 74-75 
mean value of a function, 73-74 
surfaces of revolution, 75-76 
volumes of revolution, 76-77 
as area under a curve, 60 
as the inverse of differentiation, 62-63 
formal definition, 60 
from first principles, 60-61 
in plane polar coordinates, 71-72 
logarithmic, 65 

multiple, see multiple integrals 
multivalued functions, 762-764, 766 
of Fourier series, 430 
of functions of several real variables, see 
multiple integrals 
of hyperbolic functions, 109-112 
of power series, 138 
of simple functions, 63-64 
of singular functions, 71 
of sinusoidal functions, 64-65 
integration constant, 63 
integration, methods for 
by inspection, 63-64 
by parts, 68-70 
by substitution, 66-68 
t substitution, 66-67 
completing the square, 68 
ing the square, 67 

change of variables, see change of variables 
contour, see contour integration 
Gaussian, 1168-1170 
numerical, 1164-1170 
partial fractions, 65-66 
reduction formulae, 70 
trigonometrical expansions, 64-65 
using complex numbers, 104 
intersection n, probability for, see probability, 
for intersection 

intrinsic derivative, see absolute derivative 
invariant tensors, see isotropic tensors 


inverse hyperbolic functions, 108-109 
inverse integral transforms 
Fourier, 441 
Laplace, 460, 765-768 
uniqueness, 460 
inverse matrices, 268-271 
elements, 269 

in solution of simultaneous linear equations, 
300-301 

product rule, 271 
properties, 270-271 
inverse of a linear operator. 254 
inverse of a product in a group, 888 
inverse of element in a group 
uniqueness, 885, 888 
inversion theorem, Fourier's, 441 
inversions as 

improper rotations, 795 
symmetry operations, 883-884 
irregular singular points, 540 
irreps, 929 

n-dimensional, 931, 944 
counting, 937-938 
dimension ni, 939 
direct sum ®, 928 
identity Ai, 942, 946 
n-dimensional, 930 
number in a representation, 929, 937 
one-dimensional, 930, 931, 935, 941, 944 
orthogonality theorem, 932-934 
projection operators for, 949 
reduction to, 938 
summation rules for, 939-941 
irrotational vectors, 359 
isobaric ODE, 482 
non-linear, 527-528 
isoclines, method of, 1188, 1196 
isomorphic groups, 893-898, 900. 901 
isomorphism (mapping), 902 
isotope decay, 490, 531 
isotropic (invariant) tensors, 793-795, 802 
iteration schemes 

convergence of, 1156-1158 
for algebraic equations, 1150-1158 
for differential equations, 1185 
for integral equations, 872-875 
Gauss-Seidel, 1160-1162 
order of convergence, 1157 

J v (z), see Bessel functions 

j, square root of —1, 87 

y/(z), see spherical Bessel functions 

Jacobians 

analogy with derivatives, 210 
and change of variables, 209-210 
definition in 

three dimensions, 208 
two dimensions, 204 
general properties, 209-210 
in terms of a determinant, 204, 208, 209 
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joint distributions, see bivariate distributions 
and multivariate distributions 
Jordan’s lemma, 761-762 

kernel of a homomorphism, 902, 905 
kernel of an integral transform, 465 
kernel of integral equations 
displacement, 868 
Hermitian, 875 
of form exp(— ixz), 869-871 
of linear integral equations, 863 
resolvent, 873, 874 
separable (degenerate), 866-867 
kinetic energy of oscillating system, 322 
Klein-Gordon equation, 643 
Kronecker delta and orthogonality, 249 
Kronecker delta, S,j (5 /), tensor, 777, 790-795, 
805,811 

identities, 792-793 
isotropic, 794 
vector products, 791 

L’Hopital’s rule, 145-147 
Lagrange equations, 848 

and energy conservation, 856 
Lagrange undetermined multipliers, 170-176 
and ODE eigenvalue estimation, 851 
application to stationary properties of the 
eigenvectors of quadratic/Hermitian forms, 
295 

for functions of more than two variables, 
172-176 

in deriving the Boltzmann distribution, 
174-176 

integral constraints, 844 
with several constraints, 172-176 
Lagrange’s identity, 230 
Lagrange’s theorem, 907-908 
and the order of a subgroup, 904 
and the order of an element, 904 
Lagrangian, 848, 856 
Laguerre equation, 541, 596-597 
Laguerre polynomials L„(x) 
generating function, 597 
orthogonality, 596 
Rodrigues’ formula, 577, 596 
Lame constants, 802 

lamina: mass, centre of mass and centroid, 
196-198 

Laplace equation, 612 

expansion methods, 676-678 
in three dimensions 

cylindrical polars, 661-664 
spherical polars, 664-671 
in two dimensions, 621, 623, 650, 651 
and analytic functions, 715 
and conformal transformations, 735-738 
numerical method for, 1191, 1197 
plane polars, 658-660 
separated variables, 650 


uniqueness of solution, 676 
with specified boundary values, 699, 701 
Laplace expansion, 264-265 
Laplace transforms, 459-465, 765 
convolution 

associativity, commutativity, distibutivity, 
464 

definition, 463 
convolution theorem, 463 
definition, 459 

for ODE with constant coefficients, 507-509 
for PDE, 681-683 
inverse, 460, 765-768 
uniqueness, 460 

properties: translation, exponential 
multiplication, etc., 462 
table for common functions, 461 
Laplace transforms, examples 
constant, 459 
derivatives, 461 
exponential function, 459 
integrals, 462 
polynomial, 459 

Laplacian, see del squared V 2 (Laplacian) 
Laurent expansion, 749-752 
analytic and principal parts, 749 
region of convergence, 749 
least squares, method of, 1113-1119 
basis functions, 1115 
linear, 1114 
non-linear, 1118 
response matrix, 1115 
Legendre equation, 540, 541, 555-564, 582, 
593-594 

as an example of a Sturm-Liouville equation, 
591 

associated, see associated Legendre equation 
general series solution, 556 
Legendre functions, 556 
of second kind, 557 
Legendre functions Pz(x) 
associated Legendre functions, 703 
Legendre linear equation, 509 
Legendre polynomials fV(.x) 
orthogonality, 668 
Legendre polynomials P/(x), 557 
associated Legendre functions, 666 
generating function, 562-564, 594 
graph of, 557 

in Gaussian integration, 1168 
normalisation, 557, 559 
orthogonality, 560, 594 
recurrence relation, 562 
Rodrigues’ formula, 559, 594 
Leibnitz’ rule for differentiation of integrals, 181 
Leibniz’ theorem, 49-51 
length of 

a vector, 222-223 
plane curves, 74-75, 347 
space curves, 347 
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tensor form, 831 

Levi-Civita symbol, see e,*, Levi-Civita symbol, 
tensor 

likelihood function, 1097 
limits, 144-147 
definition, 144 
L'Hopital’s rule, 145-147 
of functions containing exponents, 145 
of integrals, 60 
containing variables, 191 
of products, 144 
of quotients, 144-147 
of sums, 144 

line charge, electrostatic potential, 726, 738 
line integrals 

and Cauchy integrals, 745-747 
and Stokes' theorem, 412-415 
of scalars, 383-393 
of vectors, 383-395 
physical examples, 387 
round closed loop, 392 
line, vector equation of, 230-231 
linear dependence and independence 
definition in a vector space, 247 
of basis vectors, 221 
relationship with rank, 272 
linear differential operator C, 517, 551, 581 
adjoint C\ 587 

eigenfunctions, see eigenfunctions 
eigenvalues, see eigenvalues, of linear 
differential operators 
for Sturm-Liouville equation, 591-593 
Hermitian (self-adjoint), 583, 587-591 
linear equations, differential 
first-order ODE, 480 
general ODE, 496-523 
ODE with constant coefficients, 498-509 
ODE with variable coefficients, 509-523 
linear equations, simultaneous, see simultaneous 
linear equations 

linear independence of functions, 497 
Wronskian test, 497, 538 
linear integral operator 7C, 864 

and Schmidt-Hilbert theory, 875-877 
Hermitian conjugate, 864 
inverse, 865 

linear interpolation for algebraic equations, 
1152-1153 

linear least squares, method of, 1114 
linear molecules 

normal modes of, 326-328 
symmetries of, 919 
linear operators, 252-254 
associativity, 254 
distributivity over addition, 254 
eigenvalues and eigenvectors, 277 
in a particular basis, 253 
inverse, 254 

non-commutativity, 254 


particular: identity, null/zero, 
singular/non-singular, 254 
properties, 254 

linear vector spaces, see vector spaces 
Liouville’s theorem, 747 
Ln of a complex number, 102-103, 720 
In 

Maclaurin series for, 143 
of a complex number, 102-103, 720 
log-likelihood function, 1100 
longitudinal vibrations in a rod, 610 
lottery (UK), and hypergeometric distribution, 
1016 

lower triangular matrices, 274 

Maclaurin series, 141 

standard expressions, 143 
Madelung constant, 151 
magnetic dipole, 224 
magnitude of a vector, 222-223 
in terms of scalar/dot product, 225 
mappings between groups, see groups, mappings 
between 

marginal distributions, 1040 
mass of non-uniform bodies, 196 
matrices, 246-312 
as a vector space, 257 
as arrays of numbers, 254 
as representation of a linear operator, 254 
column, 255 
elements, 254 

minors and cofactors, 264 
identity/unit, 259 
row, 255 
zero/null, 259 
matrices, algebra of, 255 
Cholesky separation , 318 
addition, 256-257 

and normal modes, see normal modes 
change of basis, 288-290 
diagonalisation, see diagonalisation of 
matrices 

multiplication, 257-259 

and common eigenvalues, 283 
commutator, 314 
non-commutativity, 259 
multiplication by a scalar, 256-257 
numerical methods, see numerical methods 
for simultaneous linear equations 
similarity transformations, see similarity 
transformations 

simultaneous linear equations, see 
simultaneous linear equations 
subtraction, 256 
matrices, derived 
adjoint, 261-263 
complex conjugate, 261-263 
Hermitian conjugate, 261-263 
inverse, see inverse matrices 
transpose, 255 
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matrices, properties of 

anti-Hermitian, see anti-Hermitian matrices 
antisymmetric / skew-symmetric, 275 
determinant, see determinants 
diagonal, 273 

eigenvalues, see eigenvalues 
eigenvectors, see eigenvectors 
Hermitian, see Hermitian matrices 
normal, see normal matrices 
nullity, 298 
order, 254 

orthogonal, 275-276 
rank, 272 
square, 254 
symmetric, 275 
trace/spur, 263 
triangular, 274 
tridiagonal, 1162-1164, 1190 
unitary, see unitary matrices 
matrix elements in quantum mechanics 
as integrals, 945 
dipole. 950-951, 957 

maxima and minima (local) of a function of 
constrained variables, see Lagrange 
undetermined multipliers 
one real variable, 51-53 
sufficient conditions, 52 
several real variables, 165-170 
sufficient conditions, 167, 170 
maximum modulus theorem, 756 
maximum-likelihood, method of, 1097-1113 
bias, 1102 

data modelling, 1097 
estimator, 1098 
log-likelihood function, 1100 
parameter estimation, 1097 
transformation invariance, 1102 
Maxwell's 

electromagnetic equations, 379, 414, 828 
thermodynamic relations, 179-181 
Maxwell-Boltzmann statistics, 980 
mean /( 

from MGF, 1005 
from PGF, 1000 
of RVD, 986-987 
of sample, 1066 

of sample: geometric, harmonic, root mean 
square, 1066 

mean value of a function of 
one variable, 73-74 
several variables, 202 
mean value theorem, 57-58 
median of RVD, 987 
membrane 

deformed rim, 658-660 
normal modes, 673, 954 
transverse vibrations, 610. 673, 702, 954 
method of images, 639, 693-699, 738 
disc (section of cylinder), 699, 701 
infinite plate, 694 


intersecting plates in two dimensions, 696 
sphere, 697-698, 706 
metric tensor, 806-809, 812 
and Christoffel symbols, 815 
and scale factors, 806, 821 
covariant derivative of, 831 
determinant, 806, 813 
derivative of, 822 
length element, 806 
raising/lowering index, 808, 812 
scalar product, 807 
volume element, 806, 830 
MGF, see moment generating functions 
Milne’s method, 1182 
minimum- variance estimator, 1075 
minor of a matrix element, 264 
mixed, components of tensor, 806, 811, 818 
ML estimator. 1098 
ML estimators, 1098 
bias, 1102 

confidence limits, 1104 
efficiency, 1103 

transformation invariance, 1102 
mod N, multiplication, 891 
mode of RVD, 987 
modulo, see mod N, multiplication 
modulus 

of a complex number, 90 
of a vector, see magnitude of a vector 
molecules 

bonding in, 945, 947-950 
dipole moments of, 919-920 
symmetries of, 919 

moment generating functions (MGF), 1004-1009 
and central limit theorem, 1037-1038 
and PGF, 1005 
mean and variance, 1005 
particular distributions 
binomial, 1012 
exponential, 1033 
Gaussian, 1005, 1027 
Poisson, 1019 
properties, 1005 
moments 
central, 990 
of RVD, 989 
moments of inertia 
and inertia tensor, 800 
definition, 201 
of disc, 211 

of rectangular lamina, 201 
of right circular cylinder, 212 
of sphere, 208 

perpendicular axes theorem, 212 
moments, vector representation of, 227 
momentum as first-order tensor, 782 
monomorphism, 903 
Monte Carlo methods 
antithetic variates, 1174 
control variates, 1173 
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crude, 1172 

hit or miss, 1174 

importance sampling, 1172 

multiple integrals, 1176 

random number generation, 1177 

stratified sampling, 1172 

Monte Carlo methods, of integration, 1170-1177 
Morera's theorem, 744 
multinomial distribution, 1050-1051 
and multiple Poisson distribution, 1060 
multiple angles, trigonometric formulae, 10 
multiple integrals 
application in finding 

area and volume, 194-196 
mass, centre of mass and centroid, 196-198 
mean value of a function of several 
variables, 202 
moments of inertia, 201 
change of variables 

double integrals, 203-207 
general properties, 209-210 
triple integrals, 207-209 
definitions of 

double integrals, 190-191 
triple integrals, 193 
evaluation, 191-193 
notation, 191, 192, 194 
order of integration, 191-192, 194 
caveats, 193 

multiplication tables for groups, see group 
multiplication tables 

multiplication iheorem, see Parseval’s theorem 
multivalued functions, 721-723 
integration of, 762-764 
multivariate distributions, 1038, 1049-1053 
change of variables, 1048-1049 
Gaussian, 1051 
multinomial, 1050-1051 
mutually exclusive events, 962, 971 

tv(z), see spherical Bessel functions 
nabla V, see gradient operator (grad) 
natural logarithm, see In and Ln 
natural numbers, in series, 31, 124-125 
natural representations, 923, 952 
Necessary and sufficient conditions, 34-35 
negative 

function, 583 
vector, 247 

Neumann boundary conditions, 635 
Green’s functions, 688, 700-702 
method of images, 700-702 
self-consistency, 700 
Neumann series, 872-874 
Newton-Raphson (NR) method, 1154-1156 
order of convergence, 1157 
Neyman-Pearson test, 1122 
nodes of oscillation, 626 
non-Abelian groups, 894-898 
functions, 897 


matrices, 896 
permutations, 898-900 
rotations-reflections, 894 
non-Cartesian coordinates, see curvilinear, 

cylindrical polar, plane polar and spherical 
polar coordinates 

non-linear differential equations, see ordinary 
differential equations, non-linear 
non-linear least squares, method of, 1118 
norm of 
function, 584 
vector, 249 
normal 

to a plane, 232 
to coordinate surface, 372 
to surface, 352, 356, 396 
normal derivative, 356 
normal distribution, see Gaussian (normal) 
distribution 
normal matrices, 277 
eigenvectors 

completeness, 280 
orthogonality, 280-281 
eigenvectors and eigenvalues, 278-281 
normal modes, 322-335 
characteristic equation, 325 
coupled pendulums, 335, 337 
definition, 326 
degeneracy, 952-955 
frequencies of, 325 
linear molecular system, 326-328 
membrane, 673, 954 
normal coordinates, 326 
normal equations, 326 
rod-string system, 323-326 
symmetries of, 328 
normalisation of 
eigenfunctions, 589 
eigenvectors, 278 
functions, 585 
vectors, 223 
null (zero) 
matrix, 259, 260 
operator, 254 
space, of a matrix, 298 
vector, 218, 247, 583 

null operation, as identity element of group, 886 
nullity, of a matrix, 298 
numerical methods for algebraic equations, 
1149-1156 

binary chopping, 1154 

convergence of iteration schemes, 1156-1158 
linear interpolation, 1152-1153 
Newton-Raphson, 1154-1156 
rearrangement methods, 1151-1152 
numerical methods for integration, 1164-1170 
Monte Carlo, 1170 
Gaussian integration, 1168-1170 
midpoint rule, 1194 
nomenclature, 1165 
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Simpson’s rule, 1167 
trapezium rule, 1166-1167 
numerical methods for ordinary differential 
equations, 1180-1190 
accuracy and convergence, 1181 
Adams method, 1184 
difference schemes, 1181-1183 
Euler method, 1181 
first-order equations, 1181-1188 
higher-order equations, 1188-1190 
isoclines, 1188 
Milne’s method, 1182 
prediction and correction, 1184-1186 
reduction to matrix form, 1190 
Runge-Kutta methods, 1186-1188 
Taylor series methods, 1183-1184 
numerical methods for partial differential 
equations, 1190-1192 
diffusion equation, 1192 
Laplace’s equation, 1191 
minimising error, 1192 
numerical methods for simultaneous linear 
equations, 1158-1164 
Gauss-Seidel iteration, 1160-1162 
Gaussian elimination with interchange, 
1159-1160 

matrix form, 1158-1164 
tridiagonal matrices, 1162-1164 

O(.x), order of, 135 

observables in quantum mechanics, 282, 588 
odd functions, see antisymmetric functions 
ODE, see ordinary differential equations 
operators 

Elermitian, see Elermitian operators 
linear, see linear operators and linear 
differential operator and linear integral 
operator 
order of 

approximation in Taylor series, 140n 

convergence of iteration schemes, 1157 

group, 885 

group element, 889 

ODE, 474 

permutation, 900 

subgroup, 903 

and Lagrange’s theorem, 907 
tensor, 779 

ordinary differential equations (ODE), see also 
differential equations, particular 
boundary conditions, 474, 476, 507 
complementary function, 497 
degree, 474 

dimensionally homogeneous, 481 
exact, 478, 511-512 
first-order, 474^190 
first-order higher-degree, 486^190 
soluble for p, 486 
soluble for x, 487 
soluble for y, 488 


general form of solution, 474-476 
higher-order, 496-529 
homogeneous, 496 
inexact, 479 
isobaric, 482, 527-528 
linear, 480, 496-523 
non-linear, 524-529 
x absent, 524 
y absent, 524 
exact, 525 

isobaric (homogeneous), 527-528 
order, 474 

ordinary point, see ordinary points of ODE 
particular integral (solution), 475, 498, 
500-501 

singular point, see singular points of ODE 
singular solution, 475, 487, 488, 490 
ordinary differential equations, methods for 
canonical form for second-order equations, 
522 

eigenfunctions, 581-602 
equations containing linear forms, 484-486 
equations with constant coefficients, 498-509 
Green’s functions, 517-522 
integrating factors, 479^181 
Laplace transforms, 507-509 
numerical, 1180-1190 
partially known CF, 512 
separable variables, 477 
series solutions, 537-558, 564-568 
undetermined coefficients, 500 
variation of parameters, 514-516 
ordinary points of ODE, 539, 541-544 
indicial equation, 549 
orthogonal lines, condition for, 12 
orthogonal matrices, 275-276, 778, 779 
general properties, see unitary matrices 
orthogonal systems of coordinates, 370 
orthogonal transformations, 781 
orthogonalisation (Gram-Schmidt) of 
eigenfunctions of an Elermitian operator. 
589-590 

eigenvectors of a normal matrix, 280 
functions in a Hilbert space, 584-586 
orthogonality of 

eigenfunctions of an Elermitian operator. 
589-590 

eigenvectors of a normal matrix, 280-281 
eigenvectors of an Hermitian matrix, 282 
functions, 584 

terms in Fourier series, 423, 431 
vectors, 223, 249 

orthogonality properties of characters, 936, 944 
orthogonality theorem for irreps, 932-934 
orthonormal 
basis functions, 584 
basis vectors, 249-250 

under unitary transformation, 290 
oscillations, see normal modes 
outcome, of trial, 961 
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outer product of two vectors, 785 

Pf(x), see Legendre polynomials 

P?(x), see associated Legendre functions 

Pappus' theorems, 198-200 

parabolic PDE, 620, 623 

parallel axis theorem, 242 

parallel vectors, 227 

parallelepiped, volume of, 229-230 

parallelogram equality, 252 

parallelogram, area of, 227, 228 

parameter estimation (statistics), 1072-1097, 

1140 

Bessel correction, 1090 
error in mean, 1140 
mean, 1086 
variance, 1087-1090 

parameter estimation, maximum-likelihood, 1097 
parameters, variation of, 514-516 
parametric equations 
of cycloid, 376, 844 
of space curves, 346 
of surfaces, 351 
parity inversion, 944 
Parseval's theorem 

conservation of energy, 457 
for Fourier series, 432-433 
for Fourier transforms, 456-457 
partial derivative, see partial differentiation 
partial differential equations (PDE), 608-640, 
646-702, see also differential equations, 
particular 

arbitrary functions, 613-618 
boundary conditions, 614, 632-640, 656 
characteristics, 632-638 
and equation type, 636 
equation types, 620, 643 
first-order, 614-620 
general solution, 614-625 
homogeneous, 618 

inhomogeneous equation and problem, 
618-620, 678-681, 686-702 
particular solutions (integrals), 618-625 
second-order, 620-631 

partial differential equations (PDE), methods for 
change of variables, 624-625, 629-631 
constant coefficients, 620 
general solution, 622 
integral transform methods, 681-686 
method of images, see method of images 
numerical, 1190-1192 
separation of variables, see separation of 
variables 

superposition methods, 650-657 
with no undifferentiated term, 617-618 
partial differentiation, 154-182 

as gradient of a function of several real 
variables, 154-155 
chain rule, 160-161 
change of variables, 161-163 


definitions, 154-156 
properties, 160 
cyclic relation, 160 
reciprocity relation, 160 
Partial fractions, 18-25 

and degree of numerator, 21 
complex roots, 22 
repeated roots, 23 
partial fractions 

as a means of integration, 65-66 
in inverse Laplace transforms, 460, 508 
partial sum, 118 

particular integrals (PI), 475, see also ordinary 
differential equation, methods for and 
partial differential equations, methods for 
partition of a 
group, 906 
set, 907 

parts, integration by, 68-70 
path integrals, see line integrals 

PDE, see partial differential equations 

PDF, see probability functions, density functions 
penalty shoot-out, 1056 

pendulums, coupled, 335, 337 
periodic function representation, see Fourier 
series 

permutation groups S„, 898-900 
cycle notation, 899 
permutation law in a group. 889 
permutations, 975-981 
degree, 898 
distinguishable, 977 
order of, 900 
symbol "P*, 975 

perpendicular axes theorem, 212 
perpendicular vectors, 223, 249 
PF, see probability functions 
PGF, see probability generating functions 
PI, see particular integrals 
plane curves, length of, 74-75 
in Cartesian coordinates, 74 
in plane polar coordinates, 75 
plane polar coordinates, 71, 342 
arc length, 75, 367 
area element, 205, 367 
basis vectors, 342 
velocity and acceleration, 343 
plane waves, 628, 649 
planes 

and simultaneous linear equations, 305-306 
vector equation of, 231-232 
plates, conducting, see also complex potentials, 
for plates 

line charge near, 696 
point charge near, 694 

point charges, S -function respresentation, 447 
point groups, 924 

points of inflection of a function of 
one real variable, 51-53 
several real variables, 165-170 
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Poisson distribution Po(2), 1016-1021 
and Gaussian distribution, 1029-1030 
as limit of Binomial distribution, 1016, 1019 
mean and variance, 1018 
MGF, 1019 
multiple, 1020-1021 
recurrence formula, 1018 
Poisson equation, 606, 612, 678-681 
fundamental solution, 691-693 
Green's functions, 688-702 
uniqueness, 638-640 
Poisson summation formula, 467 
Poisson’s ratio, 802 

polar coordinates, see plane polar and cylindrical 
polar and spherical polar coordinates 
polar representation of complex numbers, 95-98 
polar vectors, 798 

pole, of a function of a complex variable 
contours containing, 758-768 
order, 724, 750 
residue, 750-752 
polynomial equations, 1-10 
conjugate roots, 102 
factorisation, 7 
multiplicities of roots, 4 
number of roots, 86, 88, 770 
properties of roots, 9 
real roots, 1 

solution of using de Moivre’s theorem, 
101-102 

polynomial solutions of ODE, 544, 554-555 
populations, sampling of, 1065 
positive definite and semi-definite 
quadratic/Hermitian forms, 295 
positive semi-definite norm, 249 
potential energy of 

ion in a crystal lattice, 151 
magnetic dipoles 

vector representation, 224 
oscillating system, 323 
potential function 

and conservative fields, 395 
complex, 725-730 

electrostatic, see electrostatic fields and 
potentials 

gravitational, see gravitational fields and 
potentials 
vector, 395 
power series 

and differential equations, see series solutions 
of differential equations 
interval of convergence, 135 
Maclaurin, see Maclaurin series 
manipulation: difference, differentiation, 
integration, product, substitution, sum, 
137-138 

Taylor, see Taylor series 

power series in a complex variable, 136, 716-718 
analyticity, 718 

circle and radius of convergence, 136, 717-718 


convergence tests, 717, 718 
form, 716 

power, in hypothesis testing, 1122 
powers, complex, 102-103, 719 
prediction and correction methods, 1184-1186, 
1194 

prime, non-existence of largest, 34 
principal axes of 

Cartesian tensors, 800-802 
conductivity tensors, 801 
inertia tensors, 800 
quadratic surfaces, 297 
rotation symmetry, 944 
principal normals of space curves, 348 
principal value of 

complex integrals, 760 
complex logarithms, 103, 720 
principle of the argument, 755 
probability, 966-1053 
axioms, 967 
conditional, 970-975 
Bayes’ theorem, 974-975 
combining, 972 
definition, 967 
for intersection n, 962 
for union U, 963, 967-970 
probability distributions, 981, see also individual 
distributions 

bivariate, see bivariate distributions 
change of variables, 992-999 
generating functions, see moment generating 
functions and probability generating 
functions 
mean it, 986-987 
mean of functions, 987 
mode, median and quartiles, 987 
moments, 989-992 

multivariate, see multivariate distributions 
standard deviation a, 988 
variance a 2 , 988 
probability functions (PF), 981 
cumulative (CPF), 981, 983 
density functions (PDF), 982 
probability generating functions (PGF), 
999-1004 
and MGF, 1005 
binomial, 1003 
definition, 1000 
geometric, 1001 
mean and variance, 1000-1001 
Poisson, 1000 
sums of RV, 1003 
trials, 1000 

variable sums of RV, 1003-1004 
product rule for differentiation, 45^17, 49-51 
products of inertia, 800 
projection operators for irreps, 949, 958 
projection tensors, 828 
proper rotations, 795 
proper subgroups, 903 
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pseudoscalars, 796, 799 
pseudotensors, 795-799, 813 
pseudovectors, 795-799 

quadratic equations 
properties of roots, 10 
roots of, 2 

quadratic equations, complex roots of, 86-87 
quadratic forms, 293-297 

positive definite and semi-definite, 295 
quadratic surfaces, 297 
removing cross terms, 294 
stationary properties of eigenvectors, 295-296 
quartiles, of RVD, 987 
quaternion group, 915, 955 
quotient law for tensors, 788-790 
quotient rule, for differentiation, 48 
quotient test for series, 130 

radius of convergence, 136, 717 
radius of curvature of space curves, 348 
radius of torsion of space curves, 349 
random number generation, 1177 
random numbers 

non-uniform distribution, 1194 
random variable distributions, see probability 
distributions 

random variables (RV), 961, 981-985 
continuous, 982-985 
dependent, 1038-1047 
discrete, 981-982 
independent, 998, 1042 
sums of, 1002-1004 
range, of a matrix, 298 
rank of matrices, 272 
and determinants, 272-273 
and linear dependence, 272 
rank of tensors, see order of, tensors 
rate of change of a function of 
one real variable, 42 
several real variables, 156-158 
ratio comparison test, 130 
ratio test (D'Alembert), 129, 718 
in convergence of power series, 135 
ratio theorem, 219 

and centroid of a triangle, 220-221 
Rayleigh-Ritz method, 333-335, 859 
real part/term of a complex number, 86-87 
real roots, of a polynomial equation, 1 
rearrangement methods for algebraic equations, 
1151-1152 

reciprocal vectors, 237-238, 372, 804, 808 
reciprocity relation for partial derivatives, 160 
rectangular distribution, 1036 
Fourier transform of, 448 
recurrence relations, 502-507 
characteristic equation, 505 
coefficients, 542, 543. 1163 
first-order, 503 
functions, 562, 569-570 


higher-order, 507 
second-order, 505 
reducible representations, 926, 928 
reduction formulae for integrals, 70 
reflections 

and improper rotations, 795 
as symmetry operations, 883-884 
reflexivity, and equivalence relations, 906 
regular functions, see analytic functions 
regular representations, 939, 952 
regular singular points, 540, 544-546 
relative velocities, 222 
remainder term in Taylor series, 141 
repeated roots of auxiliary equation, 499 
representations, 918 
definition, 924 
dimension of, 920, 924 
equivalent, 926-928 
faithful, 925, 940 
generation of, 920-926, 954 
irreducible, see irreps 
natural, 923, 952 
product, 945-947 
reducible, 926, 928 
regular, 939, 952 
counting irreps, 940 
unitary, 928 

representative matrices, 921 
block-diagonal, 928 
eigenvalues, 942 
inverse, 925 

number needed, and order of group, 924 
of identity, 924 
residue 

at a pole, 750-752 
theorem, 752-754 
resolution function, 452 
resolvent kernel, 873, 874 
response matrix, for linear least squares, 1115 
rhomboid, volume of, 241 
Riemann tensor, 830 

Riemann theorem for conditional convergence, 
127 

Riemann zeta series, 131, 132 
right hand screw rule, 226 
Rodrigues’ formula for 

associated Legendre functions, 594 
Chebyshev polynomials, 597 
Hermite polynomials, 596 
Laguerre polynomials, 577, 596 
Legendre polynomials, 559, 594 
Rolle's theorem, 56 
root test (Cauchy), 132, 717 
roots 

of a polynomial equation, properties, 9 
roots of unity, 100-101 
roots, of a polynomial equation, 2 
rope, suspended at its ends, 845 
rotation groups (continuous), invariant 
subspaces, 930 
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rotation matrices as a group, 890 
rotation of a vector, see curl 
rotations 

as symmetry operations, 883-884 

axes and orthogonal matrices, 779, 780, 810 

improper, 795-797 

invariance under, 783 

product of, 780 

proper, 795 

Rouche’s theorem, 755-757 
row matrix, 255 

Runge-Kutta methods, 1186-1188 
RV, see random variables 
RVD (random variable distributions), see 
probability distributions 

saddle points, 165 

sufficient conditions, 167, 170 
sampling 

correlation, 1070 
covariance, 1070 
space, 961 

statistics, 1065-1072 
with/ without replacement, 971 
scalar fields, 353 

derivative along a space curve, 355 
gradient, 354-358 
line integrals, 383-393 
rate of change, 355 
scalar product, 223-226 
and inner product, 249 
and metric tensor, 807 
and perpendicular vectors, 223, 249 
for vectors with complex components, 225 
in Cartesian coordinates, 225 
invariance, 779, 788 
scalar triple product, 228-230 
cyclic permutation of, 229 
in Cartesian coordinates, 229 
determinant form, 229 
interchange of dot and cross, 229 
scalars, 216-217 
invariance, 779 
zero-order tensors, 782 
scale factors, 365, 368, 370 
and metric tensor, 806, 821 
scattering in quantum mechanics, 469 
Schmidt-Hilbert theory, 875-878 
Schrodinger equation, 612 
constant potential, 703 
hydrogen atom, 675 
numerical solution, 1198 
variational approach, 854 
Schwarz inequality, 251, 586 
Schwarz-Christoffel transformation, 733-735 
second differences, 1179 

second-order differential equations, see ordinary 
differential equations and partial differential 
equations 

secular determinant, 285 


self-adjoint operators, see Hermitian operators 
semicircle, angle in, 18 
semicircular lamina, centre of mass, 200 
separable 

kernel in integral equations, 866-867 
variables in ODE, 477 
separation constants, 648, 650 
separation of variables, for PDE, 646-681 
diffusion equation, 649, 655-657, 671, 685 
expansion methods, 676-678 
general method, 646-650 
Elelmholtz equation, 671-675 
inhomogeneous boundary conditions, 655-657 
inhomogeneous equations, 678-681 
Laplace equation, 650-655, 658-671, 676 
polar coordinates, 658-681 
separation constants, 648, 650 
superposition methods, 650-657 
wave equation, 647-649, 671, 673 
series, 118-144 

convergence of, see convergence of infinite 
series 

differentiation of, 134 

finite and infinite, 119 

integration of, 134 

multiplication by a scalar, 134 

multiplication of (Cauchy product), 134 

notation, 119 

operations, 134 

summation, see summation of series 
series, particular 
arithmetic, 120 
arithmetico-geometric, 121 
Fourier, see Fourier series 
geometric, 120 
Maclaurin, 141, 143 
power, see power series 
powers of natural numbers, 124-125 
Riemann zeta, 131, 132 
Taylor, see Taylor series 
series solutions of differential equations, 

537-558, 564-568 
about ordinary points, 541-544 
about regular singular points, 544-546 
Frobenius series, 545 
convergence, 541 
indicial equation, 545 
linear independence, 546 
polynomial solutions, 544, 554-555 
recurrence relation, 542, 543 
second solution, 542, 549-554 
derivative method, 551-554 
Wronskian method, 550, 558 
shortest path, 837 
and geodesics, 825, 831 
similarity transformations, 288-290, 778-779, 

934 

properties of matrix under, 289-290 
unitary transformations, 290 
simple harmonic oscillator, 582, 595 
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energy levels of, 604 
equation, 541 
simple poles, 724 
Simpson's rule, 1167 
simultaneous linear equations, 297-312 
and intersection of planes, 305-306 
homogeneous and inhomogeneous, 298 
singular value decomposition, 306-312 
solution using 

Cramer's rule, 304-305 
inverse matrix, 300-301 
numerical methods, see numerical methods 
for simultaneous linear equations 

sine 

in terms of exponential functions, 105 
Maclaurin series for, 143 
orthogonality relations, 423 
singular and non-singular 
integral equations, 864 
linear operators, 254 
matrices, 268 

singular integrals, see improper, integrals 
singular points (singularities), 712, 723-725 
essential, 724, 750 
removable, 725 
singular points of ODE, 539 
irregular, 540 
particular equations, 541 
regular, 540 

singular solution of ODE, 475, 487, 488. 490 
singular value decomposition 

and simultaneous linear equations, 306-312 
singular values, 307 
sinh, hyperbolic sine, 105, 720, see also 
hyperbolic functions 
skew-symmetric matrices, 275 
Snell's law, 847 
soap films, 839-840 
solenoidal vectors, 358, 395 
solid angle 

as surface integral, 401 
subtended by rectangle, 417 
solid: mass, centre of mass and centroid, 
196-198 

source density, 612 
space curves, 346-350 
arc length, 347 
binormal, 348 
curvature, 348 
Frenet-Serret formulae, 349 
parametric equations, 346 
principal normal, 348 
radius of curvature, 348 
radius of torsion, 349 
tangent vector, 348 
torsion, 348 

spaces, see vector spaces 
span of a set of vectors, 247 
sphere, vector equation of, 232 
spherical Bessel functions j((z), 675 


spherical harmonics 670-671 

spherical polar coordinates, 367-369 
area element, 368 
basis vectors, 368 
length element, 368 
vector operators, 367-369 
volume element, 208, 368 
spur of a matrix, 263-264 
spur, of a matrix, see trace, of a matrix 
square matrices, 254 
square, symmetries of, 942 
square-wave, Fourier series for, 424-425 
stagnation points of fluid flow, 727 
standard deviation a, 988 
of sample, 1067 
standing waves, 626 
stationary values 
of functions of 

one real variable, 51-53 
several real variables, 165-170 
of integrals, 835 

under constraints, see Lagrange undetermined 
multipliers 

statistical tests, and hypothesis testing ,1120 
statistics, 961, 1064-1140 
describing data, 1065-1072 
estimating parameters, 1072-1097, 1140 
Stirling's 

approximation, 1027, 1203 
asymptotic series, 1203 
Stokes' equation, 858 
Stokes' theorem, 394, 412-415 
for tensors, 804 
physical applications, 414 
related theorems, 413 
strain tensor, 802 

stratified sampling, in Monte Carlo methods, 
1172 

streamlines and complex potentials, 727 
stress tensor, 802 
stress waves, 829 
string 

loaded, 857 
plucked, 705 

transverse vibrations of, 609, 848 
Student’s t-distribution 
comparison of means, 1131 
critical points table, 1130 
normalisation, 1128 
plots, 1129 

Student's t-test, 1126-1132 
Student's t-distribution 

one/two- tailed confidence limits, 1130 
Sturm-Liouville equations, 591-597 
boundary conditions, 592 
examples, 593-597 

associated Legendre equation, 594-595 
Bessel equation, 595 
Chebyshev equation, 597 
Hermite equation, 596 
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hypergeometric equation, 603 
Laguerre equation, 596-597 
Legendre equation, 593-594 
simple harmonic oscillator, 595 
manipulation to self-adjoint form, 592-593 
two independent variables, 860 
variational approach, 849-854 
weight function, 849 
Sturm-Liouville equations 
zeroes of eigenfunctions, 603 
subgroups, 903-905 
index, 908 
normal, 905 
order, 903 

Lagrange’s theorem, 907 
proper, 903 
trivial, 903 
submatrices, 272-273 
subscripts and superscripts, 777 
contra- and covariant, 805 
covariant derivative, 818 
dummy, 777 
free, 777 

partial derivative, 818 
summation convention, 777, 804 
substitution, integration by, 66-68 
summation convention, 777, 804 
summation of series, 119-127 
arithmetic, 120 
arithmetico-geometric, 121 
contour integration method, 764-765 
difference method, 122-123 
Fourier series method, 433 
geometric, 120 

powers of natural numbers, 124-125 
transformation methods, 125-127 
differentiation, 125 
integration, 125 
substitution, 126 
superposition methods 
for ODE. 581, 597-601 
for PDE, 650-657 
surface integrals 

and divergence theorem, 407 
Archimedean upthrust, 402, 416 
of scalars, vectors, 395—402 
physical examples, 401 
surfaces, 351-353 
area of, 352 
cone, 75-76 

solid, and Pappus’ theorem, 198-200 
sphere, 352 

coordinate curves, 352 
normal to, 352, 356 
of revolution, 75-76 
parametric equations, 351 
quadratic, 297 
tangent plane, 352 
symmetric functions, 422 
and Fourier series, 425^126 


and Fourier transforms, 451 
symmetric matrices, 275 

general properties, see Hermitian matrices 
symmetric tensors, 787 
symmetry operations 
on molecules, 883 
order of application, 886 
symmetry, and equivalence relations, 906 

t substitution, 66-67 
tan -1 x, Maclaurin series for, 143 
tangent planes to surfaces, 352 
tangent vectors to space curves, 348 
tanh, hyperbolic tangent, see hyperbolic 
functions 

Taylor series, 139-144 
and finite differences, 1179, 1186 
and Taylor’s theorem, 139-142, 747 
approximation errors, 142-143 
in numerical methods, 1156, 1166 
as solution of ODE, 1183-1184 
for functions of a complex variable, 747-748 
for functions of several real variables, 163-165 
remainder term, 141 
required properties, 139 
standard forms, 139 

tensors, see Cartesian tensors and Cartesian 
tensors, particular and general tensors 
test statistic, 1120 
tetrahedral group, 957 
tetrahedron 
mass of, 197 
volume of, 195 
thermodynamics 
first law of, 179 
Maxwell’s relations, 179-181 
top-hat function, see rectangular distribution 
torque, vector representation of, 227 
torsion of space curves, 348 
total derivative, 157 
total differential, 157 
trace of a matrix, 263-264 
and second-order tensors, 788 
as sum of eigenvalues, 285, 292 
invariance under similarity transformations, 
289, 934 

trace formula, 292 
transcendental equations, 1150 
transformation matrix, 288, 294 
transformations 

active and passive, 797 
conformal, 730-738 

coordinate, see coordinate transformations 
similarity, see similarity transformations 
transforms, integral, see integral transforms and 
Fourier transforms and Laplace transforms 
transients 

in diffusion equation, 656 
in electric circuits, 491 
transitivity, and equivalence relations, 906 
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transpose of a matrix, 255, 260-261 
product rule, 261 
transverse vibrations 
membrane, 610, 673, 702 
rod, 704 
string, 609 

trapezium rule, 1166-1167 
trial functions 

for eigenvalue estimation, 852 
for particular integrals of ODE, 500 
trials, 961 

triangle inequality, 251, 586 

triangle, centroid of, 220-221 

triangular matrices, 274 

tridiagonal matrices, 1162-1164, 1190. 1193 

trignometric identities, 15 

trigonometric identities, 10 

triple integrals, see multiple integrals 

triple scalar product, see scalar triple product 

triple vector product, see vector triple product 

uncertainty principle (Heisenberg), 441^143 
undetermined coefficients, method of, 500 
undetermined multipliers, see Lagrange 
undetermined multipliers 
uniform distribution, 1036 
union U, probability for, see probability for 
union 

uniqueness theorem 
Laplace equation, 676 
Poisson equation, 638-640 
unit step function, see Heaviside function 
unit vectors, 223 
unitary 
matrices, 276 

eigenvalues and eigenvectors, 283 
representations, 928 
transformations, 290 
upper triangular matrices, 274 

variable end-points, see end-points for 
variations, variable 
variable, dummy, 62 

variables, separation of, see separation of 
variables 
variance a 1 , 988 
from MGF, 1005 
from PGF, 1001 
of dependent RV, 1044 
of sample, 1067 

variation of parameters, 514-516 
variation, constrained, 844-846 
variational principles, physical, 846-849 
Fermat, 846 
Hamilton, 847 

variations, calculus of, see calculus of variations 
vector operators, 353-375 

acting on sums and products, 360-361 
combinations of, 361-363 
curl, 359, 374 


del V, 354 
del squared V 2 , 358 
divergence (div), 358 
geometrical definitions, 404^106 
gradient operator (grad), 354-358, 373 
identities, 362, 827 
Laplacian, 358, 374 
non-Cartesian, 363-375 
tensor forms, 820-824 
curl, 823 

divergence, 821-822 
gradient, 821 
Laplacian, 822 
vector product, 226-228 
anticommutativity, 226 
definition, 226 
determinant form, 228 
in Cartesian coordinates, 228 
non-associativity, 226 
vector spaces, 247-252, 955 
action of group on, 930 
associativity of addition, 247 
basis vectors, 248-249 
commutativity of addition, 247 
complex, 247 
defining properties, 247 
dimensionality, 248 

inequalities: Bessel, Schwarz, triangle, 251-252 
invariant, 930, 955 
matrices as an example, 257 
of infinite dimensionality, 583-586 
associativity of addition, 583 
basis functions, 583-584 
commutativity of addition, 583 
defining properties, 583 
Hilbert spaces, 584-586 
inequalities: Bessel, Schwarz, triangle, 586 
parallelogram equality, 252 
real, 247 

span of a set of vectors in, 247 
vector triple product, 230 
identities, 230 
non-associativity, 230 
vectors 

as first-order tensors, 781 
as geometrical objects, 246 
base, 342 
column, 255 

compared with scalars, 216-217 
component form, 221-222 
examples of, 216 

graphical representation of, 216-217 
irrotational, 359 
magnitude of, 222-223 
non-Cartesian, 342, 364, 368 
notation, 216 
polar and axial, 798 
solenoidal, 358, 395 
span of, 247 

vectors, algebra of, 216-238 
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addition and subtraction, 217-218 
in component form, 222 
angle between, 225 

associativity of addition and subtraction, 217 
commutativity of addition and subtraction, 
217 

multiplication by a complex scalar, 226 
multiplication by a scalar, 218 
multiplication of, see scalar product and 
vector product 
outer product, 785 
vectors, applications 

centroid of a triangle, 220-221 
equation of a line, 230-231 
equation of a plane, 231-232 
equation of a sphere, 232 
finding distance from a 
line to a line, 235-236 
line to a plane, 236-237 
point to a line, 233-234 
point to a plane, 234-235 
intersection of two planes, 232 
vectors, calculus of, 340-375 
differentiation, 340-345, 350 
integration, 345-346 
line integrals, 383-395 
surface integrals, 395—402 
volume integrals, 402^103 
vectors, derived quantities 
curl, 359 
derivative, 340 
differential, 344, 350 
divergence (div), 358 
reciprocal, 237-238, 372, 804, 808 
vector fields, 353 
curl, 412 
divergence, 358 
flux, 401 

rate of change, 356 
vectors, physical 
acceleration, 341 
angular momentum, 241 
angular velocity, 227, 241, 359 
area, 399^101, 414 
area of parallelogram, 227, 228 
force, 216, 217, 224 
moment/torque of a force, 227 
velocity, 341 
velocity vectors, 341 
Venn diagrams, 961-966 
vibrations 

internal, see normal modes 
longitudinal, in a rod, 610 
transverse 

membrane, 610, 673, 702, 858, 860 
rod, 704 
string, 609, 848 

Volterra integral equation, 863, 864 
differentiation methods, 871 
Laplace transform methods, 869 


volume elements 

curvilinear coordinates, 371 
cylindrical polars, 366 
spherical polars, 208, 368 
volume integrals, 402^103 
and divergence theorem, 407 
volume of 
cone, 76 
ellipsoid, 210 
parallelepiped, 229 
rhomboid, 241 
tetrahedron, 195 
volumes 

as surface integrals, 403, 407 
in many dimensions, 213 
of regions, using multiple integrals, 194-196 
volumes of revolution, 76-77 

and surface area & centroid, 198-200 

wave equation, 609-610, 621, 849 
boundary conditions, 626-628 
characteristics, 637 
from Maxwell’s equations, 379 
in one dimension, 622, 626-628 
in three dimensions, 628, 647, 671 
standing waves, 626 
wave number, 443, 626n 
wave packet, 442 
wave vector, k. 443 

wavefunction of electron in hydrogen atom, 211 

wedge product, see vector product 

weight 

of relative tensor, 813 
of variable, 483 
weight function, 582-583, 849 
Wiener-Kinchin theorem, 456 
work done 
by force, 387 

vector representation, 224 
Wronskian 

and Green’s functions, 533 

for second solution of ODE. 550, 558 

from ODE, 538 

test for linear independence, 497, 538 

X-ray scattering, 241 

Y"'(0,(j>), see spherical harmonics 
Y v (z), Bessel functions of second kind, 568 
Young’s modulus, 610, 802 

z, as a complex number, 87 
z”, as complex conjugate, 92-94 
zero (null) 

matrix, 259, 260 
operator, 254 
vector, 218, 247, 583 
zero-order tensors, 781-784 
zeroes of a function of a complex variable, 725 
location of, 754-758, 771 
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order, 725, 750 

principle of the argument, 755 
Rouche’s theorem, 755, 757 
zeroes of Sturm-Liouville eigenfunctions, 603 
zeroes, of a polynomial, 2 
zeta series (Riemann), 131, 132 
z-plane, see Argand diagram 
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